CAHSI 2010 ANNUAL MEETING

COMPUTIN G ALLIANCE OF HISPANIC-SERVING INSTITUTIONS

PROCEEDINGS

Contributing to the National Research Agenda

Sponsored By:

CAHSI is funded by NSF Grant #CNS-0540592 and 0837556

RECRUITING RETAINING ADVANCING HISPANICS IN COMPUTING 1 TABLE OF CONTENTS

WELCOME 05

CAHSI HISTORY 06

CONFERENCE AGENDA 07

PANEL AND DEVELOPMENT WORKSHOP DESCRIPTIONS 09

STUDENT PAPERS 12

1. Multi-Agent Simulation using Distributed Linux Cluster (Nathan Nikotan, CSU-DH)

2. Using Video Game Concepts to Improve the Educational Process (Daniel Jaramillo, NMSU)

3. Finding Patterns of Terrorist Groups in Iraq: A Knowledge Discovery Analysis (Steven Nieves,UPR-Politécnic)

4. Morphological Feature Extraction for Remote Sensing (José G. Lagares, UPR-Politécnic)

5. Unsupervised Clustering of Verbs Based on the Tsallis-Torres Entropy Text Analyzing Formula (Gina Colón, UPR-Politécnic)

6. Document Classification using Hierarchical Temporal Memory (Roxana Aparicio, et al., UPR-M)

7. Low Power Software Techniques for Embedded Real Time Operating Systems (Daniel Mera, et al., UPR-M)

8. Object Segmentation in Hyperspectral Images (Susi Huamán, et al., UPR-M)

9. Hyperspectral Texture Synthesis by 3D Wavelet Transform (Néstor Diaz, et al., UPR-M)

10. Routing Performance in an Underground Mine Environment (Joseph Gonzalez, UPR-M)

11. Leveraging Model Fusion to Improve Geophysical Models (Omar Ochoa, et al., UTEP)

12. Materialography Using Image Processing with OpenCV and C# (Oscar Alberto Garcia, et al., UPR-M)

13. Network Security: A Focus in the IEEE 802.11 Protocols (Brendaliz Román, UPR-Politécnic)

14. A Specification and Pattern System to Capture Scientific Sensor Data Properties (Irbis Gallegos, UTEP)

15. An Assistive Technology Tool for Text Entry based on N-gram Statistical Language Modeling (Anas Salah Eddin, et al., FIU)

16. Detecting the Human Face Vasculature Using Thermal Infrared Imaging (Ana M. Guzman, et al., FIU)

17. An Incremental Approach to Performance Prediction (Javier Delgado, FIU)

18. Using Genetic Algorithms for Scheduling Problems (Ricardo Macias, FIU)

19. Semantic Support for Weather Sensor Data (Jessica Romo, et al., UTEP)

20. STg: Cyberinsfrastructure for Seismic Travel Time Tomography (Cesar Chacon, et al., UTEP)

21. Visualization of Inversion Uncertainty in Travel Time Seismic Tomography (Julio C. Olaya, UTEP)

22. Serious Games for 3D Seismic Travel Time Tomography (Ivan Gris, UTEP)

23. Semantic Support for Research Projects (Maria E. Ordonez, UTEP)

24. Advanced Wireless Mesh Networks: Design and Implementation (Julio Castillo, UPR -M)

STUDENT POSTERS (available online @ cahsi.org/4thannualposters) 104

1. Multi-Agent Simulation using Distributed Linux Cluster (Nathan Nikotan, CSU-DH)

2. Using Video Game Concepts to Improve the Educational Process (Daniel Jaramillo, NMSU)

3. Finding Patterns of Terrorist Groups in Iraq: A Knowledge Discovery Analysis (Steven Nieves,UPR-Politécnic)

4. Morphological Feature Extraction for Remote Sensing (José G. Lagares, UPR-Politécnic)

5. Unsupervised Clustering of Verbs Based on the Tsallis-Torres Entropy Text Analyzing Formula (Gina Colón, UPR-Politécnic)

6. Document Classification using Hierarchical Temporal Memory (Roxana Aparicio, et al., UPR-M)

2 7. Low Power Software Techniques for Embedded Real Time Operating Systems (Daniel Mera, et al., UPR-M)

8. Object Segmentation in Hyperspectral Images (Susi Huamán, et al., UPR-M)

9. Hyperspectral Texture Synthesis by 3D Wavelet Transform (Néstor Diaz, et al., UPR-M)

10. Routing Performance in an Underground Mine Environment (Joseph Gonzalez, UPR-M)

11. Leveraging Model Fusion to Improve Geophysical Models (Omar Ochoa, et al., UTEP)

12. Materialography Using Image Processing with OpenCV and C# (Oscar Alberto Garcia, et al., UPR-M)

13. Network Security: A Focus in the IEEE 802.11 Protocols (Brendaliz Román, UPR-Politécnic)

14. Towards a Systematic Approach to Create Accountable Scientific Systems (Leonardo Salayandia, UTEP)

15. A Specification and Pattern System to Capture Scientific Sensor Data Properties (Irbis Gallegos, UTEP)

16. An Assistive Technology Tool for Text Entry based on N-gram Statistical Language Modeling (Anas Salah Eddin, et al., FIU)

17. Detecting the Human Face Vasculature Using Thermal Infrared Imaging (Ana M. Guzman, et al., FIU)

18. An Incremental Approach to Performance Prediction (Javier Delgado, FIU)

19. Using Genetic Algorithms for Scheduling Problems (Ricardo Macias, FIU)

20. Semantic Support for Weather Sensor Data (Jessica Romo, et al., UTEP)

21. STg: Cyberinsfrastructure for Seismic Travel Time Tomography (Cesar Chacon, et al., UTEP) 22. Visualization of Inversion Uncertainty in Travel Time Seismic Tomography (Julio C. Olaya, UTEP)

23. Serious Games for 3D Seismic Travel Time Tomography (Ivan Gris, UTEP)

24. Semantic Support for Research Projects (Maria E. Ordonez, UTEP)

25. Advanced Wireless Mesh Networks: Design and Implementation (Julio Castillo, UPR-M)

26. Supporting Scientific Collaboration Through the Flow of Information (Aída Gándara, UTEP)

27. Caching to Improve Provenance Visualization (Hugo D. Porras, et al., UTEP)

28. Prospec 2.1: A Tool for Generating and Validating Formal Specifications (Jesus Nevarez, UTEP)

29. Checking for Specification Inconsistencies: A Prospec Component (Jorge Mendoza, et al., UTEP)

30. Integrating Autodesk Maya to XNA (Carlos R. Lacayo, UH-D)

31. VizBlog: From Java to Flash Deployment (Joralis Sánchez, UPR-M)

32. Entropy Measures Techniques to Characterize the Vocalizations of Synthetic and Natural Neotropical Anuran (Marisel Villafañe, et al, UPR-M.)

33. Predicting Survival Time From Genomic Data (María D. González Gil, UPR-M)

34. WIMS Cochlear Implant: Support to Test Electrode Array (Wilfredo O. Cartagena-Rivera, et al., UPR-M)

35. Hyperspectral Image Analysis for Abundance Estimation using CUDATM (Amilcar González, et al., UPR-M)

36. iPhone-based Digital Streaming Data Exchange for Species Classification in a Mesh Wireless Sensor Network (Nataira Pagán, et al., UPR-M)

37. Genetic Sequence Editor (Rey D. Sánchez, UTPA)

38. A Web-based User Interface for the Time-Frequency Representation of Environmental Bio-acoustics Signals (Laura M. Matos, et al., UPR-M)

39. KPAG Software, from Kronecker Product Algebra to Graphs (Richard Martínez Sánchez, UPR- RP)

40. Parallax: A Progress Indicator for MapReduce Pipelines (Kristi Morton, et al., U of Washington)

41. Visual Comparison Tool for Aggregate Data (Hooman Hemmati, et al., UH-D) 42. Teaching Entry-Level Programming concepts to Aspiring CS Majors through RPG Maker VX Games (David Salas, NMSU)

43. Video Game Design and Development Using the G.E.C.K (Bretton Murphy, UH-D)

44. Text Visualization Using Computer Software (Rafael Cruz Ortiz, UH-D)

45. Using Panda3D to Create 3D Games (Jeremiah Davis, NMSU)

46. Tracking Moving Objects Using Two Synchronous Cameras (Diego Rojas, TAMU-CC)

47. GPU Programming: Transferring the Data or Recomputing in a Many-body Problem (Axel Y. Rivera Rodríguez, UPR-H)

48. A Comparison of Text based languages and Visually based IDEs used for creating Computer Games (Richard G. Trujillo, UPR-RP)

49. Live: Could you be at Risk While Playing Online? (Jeremy Cummins, et al.)

50. Hyperspectral Image Processing Algorithms for Cancer Detection on CUDA/GPUs (Christian Sánchez-López, et al., UPR-M)

3 51. Performance of Routing Protocols in Unmanned Aerial Vehicles (Deisi Ayala, CSU-DH)

SPEAKER BIOGRAPHIES 107

Full proceeding available for download online at: http://www.cahsi.org/reports

4 T H E U N I V E R S I T Y O F T E X A S A T E L P A S O

Ann Q. Gates Associate Vice President for Research

Dear 2010 CAHSI Participants:

We were especially pleased that Microsoft hosted the 4th Annual CAHSI meeting--“Contributing to the National Research Agenda.” meeting at the Microsoft Redmond campus. What a perfect setting for a meeting focused on the national research agenda.

The CAHSI meeting highlighted outstanding Hispanic researchers through plenary talks, panels, and workshops; the poster session highlighted the excellent research being conducted by CAHSI undergraduate and graduate students. We had a dynamic and motivating program thanks to participants from , IBM, Oracle, Intel, Lawrence Berkeley Laboratories, GEM Consortium, and the Hispanic Scholarship Fund. Other organizations that contributed to the CAHSI meeting included Latinas in Computing, CMD-IT, CRA-W, and AccessComputing.

Thank you for making a difference by becoming involved with CAHSI.

Sincerely,

Ann Quiroz Gates, Ph.D. CAHSI PI

500 W. University Admin 209 El Paso, Texas 79968-0500 (915) 747-5680 FAX: (915) 747-6474

5 CAHSI HISTORY

ABOUT

With over ten years of collaborative experience, seven Hispanic-Serving Institutions (HSIs) formed the Computing Alliance of Hispanic-Serving Institutions (CAHSI) in 2004 with the aim of unifying efforts to address the underrepresentation of Hispanics in computing and leadership roles at all levels of the academic pipeline. This alliance was formalized in 2006 through funding from the NSF Broadening Participation in Computing (BPC) program. CAHSI institutions (founding institutions are in italics) include: California State University-Dominguez Hills (CSU-DH), California State University-San Marcos (CSU-SM), Dade College (DC), Florida International University (FIU), New Mexico State University (NMSU), Texas A&M University-Corpus Christi (TAMU-CC), University of Houston Downtown (UHD), University of Puerto Rico Mayaguez (UPRM), University of Texas at El Paso (UTEP), and University of Texas Pan American (UTPA). The goals of CAHSI have remained constant since its inception: (1) increase the number of Hispanic students who enter the computing workforce with advanced degrees; (2) support the retention and advancement of Hispanic students and faculty in computing; and (3) develop and sustain competitive education and research programs. The goals of the CAHSI extension project build on the original goals and aim to extend CAHSI’s impact and sustainability through strategic and mutually beneficial collaborations and partnerships. The CAHSI extension project goals are: (1) institute a sustainable infrastructure that supports CAHSI’s continued impact and (2) become recognized as an organization that affects decision-making and cultural change at the local, regional, and national levels.

MOTIVATION

CAHSI’s motivation lies in the rapid growth of the Hispanic population and the urgency of building a U.S. computing workforce to maintain the nation’s prominence in technology. Hispanics are the largest, youngest, and fastest growing minority group in the United States (Pew 2009), yet in 2008-2009 only 5.8% of all Bachelor’s degree recipients and 1.4% of Ph.D. recipients in computing were Hispanic—far below equity (Zweben 2009). CAHSI’s efforts are focused on increasing the number of Hispanics who enter computing, graduate from baccalaureate programs, complete advanced degrees, and advance in their careers by implementing effective practices that address causes for underrepresentation.

MAKING A DIFFERENCE

CAHSI has been making a difference by sharing resources and promoting effective, evidence-based practices. Contrary to national trends, CAHSI graduated 157 more CS baccalaureates than expected last year, and 40% were Hispanic—a rate almost three times the regional average. Since 2002, bachelor degree production rates in CS in North America have decreased by 39%, while seven CAHSI campuses have increased their CS graduation rates by 25% over the same time period. It’s important to note that six of the seven CAHSI founding members were funded through the NSF Minority Institution Infrastructure (MII) program for over a decade in the early 1990’s, and three have succeeded in obtaining the highly competitive NSF-CREST funding, which ultimately shows the progression and potential of these CAHSI institutions in integrating research with education and in moving towards high research activity. Many of the CAHSI members collaborated on numerous projects for the benefit of all. Through collaborative partnerships and networks, CAHSI’s focus has extended well beyond HSIs and computing.

6 CONFERENCE AGENDA

MONDAY, April 5, 2010 Time Description Location 1:00 pm – 2:00 pm REGISTRATION AND LUNCH (Conference agenda will be provided at front desk ) Hotel/Microsoft 122 CONCURRENT SESSION A GEM Grad Workshop/HSF Scholarships Microsoft 122/1650-Chelan 2:00 pm – 6:00 pm Jacqueline Thomas, GEM Consortium; Paco Flores, Hispanic Scholarship Fund CONCURRENT SESSION B Growing Your Research Program through Leadership, Networking, Collaboration, and Funding Microsoft 122/1775 - 1800 Dr. Patty Lopez, Intel; Dr. Gilda Garretón, Oracle; Dr. Valerie Taylor, Texas A&M U. (CMD-IT)

CONCURRENT SESSION C Microsoft 122/1720-Tillicum CAHSI Administrative Meeting (Closed meeting) RECEPTION AND STUDENT POSTER SESSION Plenary Talk Spit Fire 7:00 pm – 9:00 pm Dr. Henry Jerez, Senior Program Manager, Microsoft

TUESDAY, April 6, 2010 Time Description Location 7:00 am – 8:00 am CONTINENTAL BREAKFAST Hotel 8:00 am - 8:15 am WALK TO MICROSOFT See directions below WELCOME AND INTRODUCTION 8:30 am - 8:45 am Dr. Ann Gates, University of Texas at El Paso KEYNOTE 8:45 am -9:45 am Dr. Juan Vargas, Research Program Manager, Microsoft 9:45 am -10:00 am BREAK STUDENT PANEL How to Prepare and Make Yourself Marketable Microsoft 122/1600 -1650 Moderator: Dr. Nayda Santiago, UPRM; Irbis Gallegos, University of Texas at El Paso; Anas Salah 10:00 am – 11:00 am Eddin, Florida International University; Marisel Villafañe, University of Puerto Rico Mayaguez PROFESSIONAL PANEL Industry Research and Development Career Path Moderator: Dr. John Fernandez, TAMUCC; Dr. Patty Lopez, Intel; Dr. Gilda Garretón, Oracle; 11:00 am – 12:00 pm Dr. Dina Requena, IBM 12:00 pm – 1:00 pm LUNCH Commons

PLENARY TALK Microsoft 122/1600 -1650 1:00 pm – 2:00 pm Dr. Cecilia Aragon, Computational Research Divivion at Berkeley National Laboratory CONCURRENT SESSION A Student Advocate Workshop (By Invitation) Microsoft 122/1775 -1800 2:00 pm – 3:30 pm Bruce Edmunds, Program Manager CAHSI; Ivan Gris, UTEP CAHSI Advocate CONCURRENT SESSION B Creating a Research Plan Microsoft 122/1720 – Tillicum Dr. Malek Adjouadi, Florida International University CONCURRENT SESSION C Affinity Research Group Workshop I Microsoft 122/1650 -Chelan Dr. Ann Gates, Elsa Villa, and Dr. Steve Roach, University of Texas at El Paso 3:30 pm -3:45 pm BREAK CONCURRENT SESSION A Computer Security Microsoft 122/1775 -1800 3:45 pm – 5:15 pm Dr. Mohsen Beheshti, California State University - Dominguez Hills CONCURRENT SESSION B Tips on Solid Writing Microsoft 122/1720 – Tillicum Dr. Steve Roach, University of Texas at El Paso CONCURRENT SESSION C Affinity Research Group Workshop II Microsoft 122/1650 -Chelan Dr. Ann Gates and Elsa Villa, University of Texas at El Paso 6:00 pm - DINNER On your own

7 CONFERENCE AGENDA CONTINUED

WEDNESDAY, April 7, 2010 Time Description Location 7:00 am – 8:00 am CONTINENTAL BREAKFAST Hotel 8:00 am -8:15 am WALK TO MICROSOFT CONCURRENT SESSION A Cognitive Radio and Cognitive Networks Microsoft 122/1775-1800 8:30 am – 10:00 am Dr. Lizdabel Morales, University of Puerto Rico Mayaguez CONCURRENT SESSION B Panel: Developing Entrepreneurial, Management, and Leadership Skills for Students in Computing Fields Facilitators: Arely Mendez and Maria Elena Ordoñez, UTEP; Panelists: Dr. Bradley Jensen, Microsoft 122/1720 – Tillicum Microsoft; Aida Gandara, UTEP; Diego Rojas, TAMU-CC; Amber Faucett, TAMU-CC Microsoft Research/ 10:00 am – 11:30 am TOUR OF MICROSOFT Building 92 11:30 am - BOX LUNCHES Building 92

8

PANEL AND DEVELOPMENT WORKSHOP DESCRIPTIONS

APPLYING TO GRADUATE SCHOOL AND SCHOLARSHIPS Date: Monday, April 5 Time: 2:00 pm – 6:00 pm Target audience: Undergraduate students Speakers: Jacqueline Thomas, National GEM Consortium Paco Flores, Hispanic Scholarship Fund

The National GEM Consortium and the Hispanic Scholarship fund are sponsoring a workshop designed to excite and encourage promising undergraduate engineering and science students to consider graduate research programs. The workshop will encourage students’ consideration of graduate computing and engineering schools by delivering vital information on the importance of research and innovation, the process of choosing a graduate program and preparing an application. The workshop will end with a discussion of applying for scholarships through the GEM Consortium and the Hispanic Scholarship Fund.

GROWING YOUR RESEARCH PROGRAM THROUGH LEADERSHIP, NETWORKING, COLLABORATION, AND FUNDING Date: Monday, April 5 Time: 2:00 pm – 6:00 pm Target audience: Graduate students and faculty Speakers: Dr. Patty Lopez, Intel Corporation Dr. Gilda Garretón, Oracle Dr. Valerie Taylor, Texas A& M University

The Latinas in Computing and CMD-IT in collaboration with CRA-W are sponsoring a workshop designed to share experiences on how graduate students and faculty can build a research program. The workshop focuses on how to become a leader, the leadership roles of a researcher, and the for leading a successful project. In addition, it will cover the ins and outs of networking and building productive collaborations. The workshop ends with the essentials of writing successful proposals. The presenters share their valuable experiences in developing successful research programs and projects from an academia and industry perspective.

HOW TO PREPARE AND MAKE YOURSELF MARKETABLE Date: Tuesday, April 6 Time: 10:00 am – 11:00 am Target audience: Undergraduate and Graduate students Speakers: Irbis Gallegos, University of Texas at El Paso Anas Salah Eddin, Florida International University Marisel Villafañe, University of Puerto Rico Mayaguez

Students at all stages of their education should be thinking about seeking opportunities that can build their expertise, develop their skills, and provide experiences that make them marketable. This includes taking leadership roles in student organizations, becoming involved in activities that make a difference at the university or in the community, attending conferences, having research experiences within and outside the university, and establishing a professional network. The panel is comprised of undergraduate and graduate students who will share the things that they have done and are doing to make themselves marketable.

9 PANEL AND DEVELOPMENT WORKSHOP DESCRIPTIONS

INDUSTRY RESEARCH AND DEVELOPMENT CAREER PATH Date: Tuesday, April 6 Time: 11:00 am – 12:00 pm Target audience: Undergraduate and Graduate students Speakers: Dr. Patty Lopez, Intel Dr. Gilda Garretón, Oracle Dr. Dina Requena, IBM

There are a number of different career paths that one can take with a Ph.D. degree. This panel focuses on the path that leads to research and development in an industry setting. The panel includes three accomplished researchers from Intel, Oracle, and IBM. They will share their experiences and discuss the importance of collaborations, technology transfer, innovation, and administration.

STUDENT ADVOCATE ORIENTATION WORKSHOP (BY INVITATION) Date: Tuesday, April 6 Time: 2:00 pm – 3:30 pm Target audience: Student and Faculty Advocates Speakers: Bruce Edmunds, Program Manager CAHSI Ivan Gris, UTEP CAHSI Advocate

Make an impact by becoming a CAHSI advocate! This interactive workshop will prepare students and faculty in becoming an advocate. Student advocates work with students at their home institutions by encouraging and facilitating student participation in REU opportunities, seminars, workshops, and internships. Working with faculty, they involve students in activities that assist them in preparing competitive applications to local and external programs, scholarships and fellowships. The role of faculty advocates is to promote Hispanic faculty and young professionals into leadership roles. This includes award nominations and making recommendations for key committee positions, panels, and other positions that build leadership.

CREATING A RESEARCH PLAN Date: Tuesday, April 6 Time: 2:00 pm – 3:30 pm Target audience: Undergraduate students Speakers: Dr. Malek Adjouadi, Florida International University

A research plan is created to communicate one’s research goals and aspirations. Research plans are written for a number of different audiences, e.g., a proposal review panel, a committee that reviews fellowship applicants, and a search committee reviewing applications for a research or faculty position. The focus of this workshop is on developing a research plan for a fellowship application. It will cover the importance of defining the motivation and significance of the work, articulating research questions and goals, describing initial results, and presenting the approaches and methods used to conduct your research.

10 PANEL AND DEVELOPMENT WORKSHOP DESCRIPTIONS

AFFINITY RESEARCH GROUP WORKSHOP I Date: Tuesday, April 6 Time: 2:00 pm – 3:30 pm Target audience: Faculty Speakers: Dr. Ann Gates, Elsa Villa, and Dr. Steve Roach, University of Texas at El Paso

ARG Workshop I (Part I of a two-part workshop): The workshop introduces the Affinity Research Group model through a session focused on “Setting Research Goals and Objectives.” Participants learn about the ARG philosophy, and the ARG’s focus on the deliberate development of skills.

COMPUTER SECURITY Date: Tuesday, April 6 Time: 3:45 pm – 5:15 pm Target audience: Undergraduate and Graduate students Speakers: Dr. Mohsen Beheshti, California State University - Dominguez Hills

Computer networks and equipment are being compromised every day. This project’s concern is network security in the areas of monitoring and defending vital information and computer equipment from attacks. CSRL is conducting research on Information Fusion in Sensor-Based Intrusion Detection Systems. The project will implement real equipment, which are monitored with different detection systems and are open for being attacked. With information being gathered analysis on the information from separate sources will be analyze and compared to a fusion of sources. The CSRL has also focused on improving network security through the development of an Intrusion Detection and Prevention System (IDPS).

TIPS ON SOLID WRITING Date: Tuesday, April 6 Time: 3:45 pm – 5:15 pm Target audience: Undergraduate and Graduate students Speakers: Dr. Steve Roach, University of Texas at El Paso

The workshop provides students with a few valuable insights into the creation of a credible, readable document. Writing is important because, quite simply, appearance counts. Your technical paper or proposal may be the first glimpse a member of the faculty or research community will have of you and, in many cases, perhaps the only contact you will make. Your writing must present a positive image of you and your work, just as though you were actually presenting the paper in person. Therefore, it should not be sloppy, but must, instead, convey a sure sense of maturity, professionalism, and confidence. In a word, it must be solid. The workshop will introduce and briefly address many common mistakes in writing.

AFFINITY RESEARCH GROUP WORKSHOP I Date: Tuesday, April 6 Time: 2:00 pm – 3:30 pm Target audience: Faculty Speakers: Dr. Ann Gates, Elsa Villa, and Dr. Steve Roach, University of Texas at El Paso

ARG Workshop II: (Part II of a two-part workshop): The workshop will review the essential components of an ARG, and introduce a broad range of activities associated with ARG. Participants who attend Parts I and II of the workshop will receive an ARG Handbook.

11 STUDENT PAPERS

1. Multi-Agent Simulation using Distributed Linux Cluster (Nathan Nikotan, CSU-DH)

2. Using Video Game Concepts to Improve the Educational Process (Daniel Jaramillo, NMSU)

3. Finding Patterns of Terrorist Groups in Iraq: A Knowledge Discovery Analysis (Steven Nieves,UPR-Politécnic)

4. Morphological Feature Extraction for Remote Sensing (José G. Lagares, UPR-Politécnic)

5. Unsupervised Clustering of Verbs Based on the Tsallis-Torres Entropy Text Analyzing Formula (Gina Colón, UPR-Politécnic)

6. Document Classification using Hierarchical Temporal Memory (Roxana Aparicio, et al., UPR-M)

7. Low Power Software Techniques for Embedded Real Time Operating Systems (Daniel Mera, et al., UPR-M)

8. Object Segmentation in Hyperspectral Images (Susi Huamán, et al., UPR-M)

9. Hyperspectral Texture Synthesis by 3D Wavelet Transform (Néstor Diaz, et al., UPR-M)

10. Routing Performance in an Underground Mine Environment (Joseph Gonzalez, UPR-M)

11. Leveraging Model Fusion to Improve Geophysical Models (Omar Ochoa, et al., UTEP)

12. Materialography Using Image Processing with OpenCV and C# (Oscar Alberto Garcia, et al., UPR-M)

13. Network Security: A Focus in the IEEE 802.11 Protocols (Brendaliz Román, UPR-Politécnic)

14. A Specification and Pattern System to Capture Scientific Sensor Data Properties (Irbis Gallegos, UTEP)

15. An Assistive Technology Tool for Text Entry based on N-gram Statistical Language Modeling (Anas Salah Eddin, et al., FIU)

16. Detecting the Human Face Vasculature Using Thermal Infrared Imaging (Ana M. Guzman, et al., FIU)

17. An Incremental Approach to Performance Prediction (Javier Delgado, FIU)

18. Using Genetic Algorithms for Scheduling Problems (Ricardo Macias, FIU)

19. Semantic Support for Weather Sensor Data (Jessica Romo, et al., UTEP)

12 STUDENT PAPERS (CONT’D)

20. STg: Cyberinsfrastructure for Seismic Travel Time Tomography (Cesar Chacon, et al., UTEP)

21. Visualization of Inversion Uncertainty in Travel Time Seismic Tomography (Julio C. Olaya, UTEP)

22. Serious Games for 3D Seismic Travel Time Tomography (Ivan Gris, UTEP)

23. Semantic Support for Research Projects (Maria E. Ordonez, UTEP)

24. Advanced Wireless Mesh Networks: Design and Implementation (Julio Castillo, UPR-M)

13 Multi-Agent Simulation using Distributed Linux Cluster

Nathan Nikotan Computer Science Department, California State University Dominguez Hills 1000 E. Victoria Street, Carson, CA 90747 [email protected]

ABSTRACT 2. Multi-Agent Simulation The following is the methodology in which a This paper focuses on implementing a multi-agent multi-agent simulator was based on the communication simulation using distributed Linux cluster for swarm messaging between distributed agents. intelligence between multiple agents in the context of a battlefield or hostile area scenario. Different aspects 2.1 Multi-Agent Systems from several different papers can provide ways in A multi-agent system has been applied to which to develop an autonomous agent algorithm. An autonomous proof theorem construction wherein the autonomous agent algorithm will have the group of cooperation between reasoning components such “a individual worker agents collectively making decisions personal assistant agent, multiple proof agent, and a based on available information over an encrypted broker agent” are used to achieve a distributed problem- communication link. The group will have to assess the solving goal. [5] An autonomous agent algorithm threat level and provide an appropriate defensive simulator implements a dynamic API over agents response. Each worker agent will provide a time through the use of an agent communication language. estimate and probability estimate to a broker agent who This autonomy should theoretically reduce network publishes the results, and assigns tasks using a bidding load and communication overhead in distributed process. applications. Mobility means the ability to move from one execution environment to another. Autonomy means 1. Introduction that mobile agents can act “autonomously” – perform An agent is a process or program that acts actions based on their own assessment of autonomously based on its role assignment. An agent circumstances, such as when and where to move. system is a platform that creates, interprets, executes, Function shipping is when the processing method is transfers and terminates agent processes. [1] Fielding delivered to the location where the data is stored. Data through papers on the subject, multi-agent systems have shipping is when the data is delivered to the location been applied to computational-intensive problems. where the function method is stored. [1] This paper presents implementing a simulation for defender/extractor machines based on swarm 2.2.1 Agent Communication Language intelligence using an autonomous agent algorithm. An Agent communication language enables the agent autonomous agent algorithm will have the group of to communicate its intention to other [agents] in order individual agents collectively making decisions based to find sources of information (or services) to fulfill its on available information over an encrypted mission. [1] Communication messaging challenges: communication link. The group will have to assess the security, resource management, running the agent. threat level and provide an appropriate defensive Communication can be developed initially over LAN, response. Each worker agent will provide a time then extended to a wireless platform. An agent system estimate and probability estimate to a broker agent who handles context information using: publishes the results, and assigns tasks using a bidding 1. Serialization of the execution stack process. The broker agent can assign a worker agent to 2. Validity of object references be a defender-machine (ie. combat agent) and an extractor-machine (ie. rescue agent). Multi-threading is used to handle mobility issues such as overhead bandwidth and potential deadlocking. An agent system can provide mechanisms to properly track (and re-establish on migration) object references needed by an agent.

14 2.2.2 Simple Distributed Service Model 2.3 Multi-Agent Systems and Distributed Services Applications can make use of distributed event- Di Fatta [3] had introduced a general Multi-Agent monitoring. [2] Agents are autonomous entities that System framework that communicated over a peer-to- can encapsulate and enforce local policies. peer architecture. Agent protocols implemented using Autonomous agents are capable of learning and asynchronous message communication over a cluster of adapting to new and modified global policies that workstations providing a way for parallelization of data dictate interactions among the group membership. processing. Di Fatto outlined an agent paradigm with Mechanisms of self-configuration, self-monitoring, and various agents (control manager agent, job manager recovery can be encapsulated into Java classes. agent, data manager agent, worker agent, and Different roles within the group membership can coordinator agent) undergoing an execution strategy be represented as a set of privileges. These privileges having three distinct phases. can be dynamically constrained by associating event- Applying these three phases to the simple driven conditions with role-based operations. Agent distributed service model. In the first Bootstrap phase, policies can include, but not limited to: a) role the brokerAgent waits for join requests from all admission control, b) role operation preconditions, c) available workerAgents. Once a join request is object access control, d) event subscription and event accepted, a workerAgent receives a configuration file, modification between agent roles. Also, different agent and sends a BarrierRequest message to synchronize all services can include: c) configuration service, b) workerAgents in a bidding process. This bidding discovery service, c) location service, d) failure process will be discussed shortly. monitoring service, e) name service, f) authentication The second Computing phase will have the service. [2] BrokerAgent creating the first task from which other In Figure 1, a high availability high-performance tasks originate. The spawning of new tasks are cluster can be implemented with a server node as the dynamically generated. The workerAgents request for Broker Agent with seven worker agents. MPICH and a new task when it is idle. The third Reduction phase mpiJava be configured over a Linux network to provide requires the workerAgents transmitting results to the a message passing interface between the broker agent BrokerAgent. [3] and worker agents.

2.3.1. Bidding Process Du [5] introduced a prototype implementation for Notice the highAvailability connection, this is proof theorem construction in which different aspects accomplished with sufficient hardware network can be applied to the simple distributed service model. interface cards. In fault tolerance, as the primary node Du described “a requesting proof agent wishing to is supervised by a stand-by ready node prepared for a engage in a bidding process that provides estimates. “ take-over operation given a fault signal detected on the [5] A server interface supplies two types of highAvailability connection. information to other agents: proofs and estimates. This The server BrokerAgent node is responsible for provision of estimates is referred to as the “bidding servicing the requests of client workerAgent nodes. process.” The estimates consists of a time estimate and a probability estimate. Applied to the simple distributed service model, a given set of estimates E, and required time constraint Tr, a subset C of E {

15 Operation calculateTargetDistance e(t1,p1),…, e(tn,pn) }. This subset C can be assigned R E Action routeToTarget combatAgent processes, while another subset of { Operation targetAcquired e(t1,p1),…, e(tn,pn) } can be assigned rescueAgent Action send.agent[i].targetAcquired processes. Operation calculateExtractionDistance Action routeToExtraction } 2.4 Simulation Network } Wilson [9] applied his D’Agents Mobile Agent System to a “hypothetical search and rescue mission.” 2.4.2 Agent Task Service Model Simulations can operate on static datasets and data In Figure 3, each workerAgent provides sources, but many simulations would benefit from capabilities list so that the brokerAgent can determine being able to access dynamic data. Simulations and which agents will be better-suited to complete other resources may join or leave (the networked- defensive tasks while other agents will be better-suited simulation cloud) at any point in time. Consider a to complete extraction tasks. scenario in which a forest fire simulation demonstrating communication between various sensors in the field such as a weather simulation running at a remote site. Sensors in a hostile environment may communicate sporadically with the rest of the network. Configuration interoperability refers to the ability for group members to discover one another and communicate back and forth. D’Agent mobile agent system architecture consists of: a) user objects (simulation entities), b) generic local agents, c) mobile helper agents, d) broker agent. [9] User objects can be any producer(s) or consumer(s) of data. Simulations can be both producer(s) and consumer(s) of data.

2.4.1 Autonomous Agent Algorithm Figure 3. Agent-Task Service Model The following is a simple autonomous agent Figure 3 provides a system design for providing an algorithm in a simplified Java class annotation: autonomous agent system using Java RMI over a Linux Activity criticalMission { ActivationConstraint cluster. Communication and membership must be (E { e(t1,p1),…, e(tn,pn)}) & (0.10* Tr ) | (RapidMeetingRequired) established and maintained by a communication agent. Object communicationCapabilities Among multiple agents, the group must assess each Object combatCapabilities agent’s capabilities and, using a bidding process, each Object rescueCapabilites Role brokerAgent { agent is assigned best-suited tasks. AdmissionConstraint #(agentMembership) > 1 2.5 Simulated Battlefield Scenario Operation calculateThreatLevel Action publishThreatLevel Figure 4 illustrates a battlefield or hostile area with Operation calculateCofE an undetermined number of aggressors bearing down Action assignCombatAgents on suppressed group of suppressed soldiers that require Operation calculateRofE rescue extraction. The scenario is that autonomous Action assignRescueAgents Operation calculateExtractionDistance agents are deployed, and these agents are able to Precondition (rescueAgent[i].targetAcquired) receive tactical information from different sources. Action routeToExtraction Within moments of deployment, a brokerAgent must } monitor the communication links between multiple Role combatAgent { agents. combatAgents will provide a defensive shield, Operation calculateThreatLevel Action sendThreatLevel2Broker while rescueAgents must search for suppressed soldiers Operation calculateAggressorDistance in order to transport them to a designed extraction site. Action routeShieldDefense Operation shieldDefense Action defenseResponse Operation calculateExtractionDistance Action routeToExtraction } Role rescueAgent { Operation calculateThreatLevel Action sendThreatLevel2Broker

Figure 4. Multi-Agent System Battlefield

16 3. Simulation Development References 1. Butte, T., Technologies for the Development of Agent-based Distributed Applications, ACM Crossroads Volume 8 Issue 3, 2002, pp. 8-15, New York, NY 2. Tripathi, A., Policy-Driven Configuration and Management of Agent Based Distributed Systems, ACM International Conference of Software Engineering, Proceedings of the Fourth International Workshop on Software Engineering for Large-Scale Multi-Agent Systems 2005, pp. 1- 7, St. Louis, MO 3. Di Fatta, G., Fortino, G., A Customizable Multi-

Figure 5. Simple MSA 1.0 Simulator Agent System for Distributed Data Mining, Symposium on Applied Computing, Proceedings of the 2007 ACM Symposium on Applied Figure 5 illustrates the MSA 1.0 Simulator where Computing 2007, pp. 42-47, Seoul, Korea the goal of the simulation is to rescue suppressed 4. Benda, P., A Distributed Stand-in Agent based soldiers (green). Aggressors (red) have pinned down Algorithm for Opportunistic Resource Allocation, suppressed soldiers, and rescueAgents (blue) have been Symposium on Applied Computing, Proceedings deployed. This rescueAgent can also be assigned the of the 2006 ACM Symposium on Applied role of combatAgent which would have the defensive Computing 2006, pp. 119-125, Dijon, France capabilities to provide a defensive shield between the 5. Du, T. Li, E., Mobile Agents in Distributed aggressors and the targets of interest. As the Network Management, Communications of the rescueAgent locates the suppressed soldier(s), it can ACM, July 2003, Vol. 46, No. 7, pp. 127-132, New requests coordinates for extraction site from the York, NY brokerAgent. 6. Hunter, C., Agent-Based Distributed Software Verification, ACSC Volume 102, Proceedings of 4. Conclusion the Twenty-eighth Australiasian Conference on Into the middle of the first phase of a three-phase Computer Science Volume 38 2005, pp. 159-164, development plan, a simple simulation provides the Newcastle, Australia clustered network in which to develop an autonomous 7. Hill, J., Alford, K., A Distributed Task agent algorithm. Di Fatta delineated a Multi-Agent Environment for Teaching Artificial Intelligence System Framework wherein it developed the concept of with Agents, Technical Symposium on Computer a broker agent processing information and providing a Science Education, Proceedings of the 35th coordination mechanism. Then, Du introduced the SIGCSE Technical Symposium on Computer concept of a “bidding process” from various agent Science Education 2004, Norfolk, VA estimates and probabilities. And lastly, Wilson 8. Roos, N., A Protocol for Multi-Agent Diagnosis implemented his D’Agent Mobile Agent System into a with Spatially Distributed Knowledge, International “hypothetical search and rescue mission.” Conference on Autonomous Agent, Proceedings of The first phase would be to develop a simulation the Second International Joint Conference on demonstrating: 1) communication protocol between Autonomous Agents and Multi-Agent Systems agents; 2) bidding process for task assignment; 3) 2003, pp. 655-661, Melbourne, Australia decision-making process in determine group activity; 4) 9. Smolko, D., Design and Evaluation of the Mobile group health assessment. The second phase would be Agent Architecture for Distributed Consistency to develop a scaled-version of the physical agents: Management, ACM International Conference of different agents will be different capabilities. And Software Engineering, Proceedings of the 23rd lastly, the third stage would necessary require greater International Conference on Software Engineering funding in order to validate an autonomous system for 2001, pp. 799-800, Toronto, Ontario, Canada battlefield conditions. 10. Wilson, L, Burroughs, D. et al., An Agent-Based Framework for Linking Distributed Simulations, Acknowledgement Winter Simulation Conference, Proceedings of the nd This paper is based on work supported by the National 32 Winter Simulation Conference 2000, pp. Science Foundation (NSF) through grant CNS- 1713-1721, Orlando, FL 0837556.

17 Using Video Game Concepts to Improve the Educational Process

Daniel Jaramillo and Dr. Karen Villaverde Department of Computer Science New Mexico State University Las Cruces, NM 88003 USA Credit given to NSF Grant CNS-0837556 Emails: [email protected] , [email protected]

Abstract achievements, Progress Meters, unlockables, leaderboards, online communities and online resources This paper describes concepts that are are some things that video game designers use to make prominent in video games and how they can be applied their games engaging, fun, and challenging, much like to a classroom setting. By using just a few of these what students would like from their classes. The concepts, we found there was a positive impact on concepts address the concern of keeping a student student attendance and grades. The paper also motivated the way video games keep their players describes a research plan to implement more of these motivated.[4] This paper presents methods on concepts in later classes. implementing these ideas in a classroom setting and discusses their viability. 1. Introduction 2. Methods Video games are a young form of entertainment, having only been around for 60 years. 2.1 Achievements Video games now include many exciting visuals that immerse the player in a world that rivals that of a high An achievement in terms of a video game is budget movie. They combine many forms of accomplishing a certain task in the game and receiving entertainment like art, music, acting, and screenplays to recognition or a reward. For example, some of these create many different interactive worlds. Today, the can be as simple as “Obtaining 100 points” or “Beating video game industry has grown into a billion dollar the game.” Early in gaming history, many industry spanning hand-held devices, computers, and achievements were not officially recognized by the stand alone home consoles.[1] When looking to game designers. Now, with services like Microsoft's improve the education process, it seems unlikely that Xbox Live or Sony's Playstation Home, players receive the way video games are designed could help bring new points and medals that they can use to gauge their life into an age old system of teaching. However, when progress in a game and show off to their friends. looking to gameplay concepts of video games you can Achievements like “Complete the Game” earns the consider a student like a player and the classes like player points, a medal, or some other means of reward. games. This stretch is not too far off, considering you The use of achievements in games have been used to have students looking to improve their GPA by extend play time and motivate the player to play the obtaining the best scores possible on exams and classes. game in different ways, keeping player interest high. They are competing with other students for internships, Student interest is just as important to a professor as jobs and other limited resources, much like video game player interest is for the developer and integrating players looking to obtain high scores and recognition in achievements into the education process is a way to tournaments. You have students working together in keep the player or the student interested and motivated. teams to achieve specific goals or projects, like a team Some classes already do this, by rewarding students for of players trying to fight a boss in a game or to solve a high grades, then allowing them to use those rewards puzzle together. These players use team-work centric for other privileges in class. skills in these games that students may use in a project In a classroom setting, achievements should be environment.[3] If a teacher or university could meaningful, relevant, and worth something more than identify what makes a video game fun and bring it into just bragging rights and an arbitrary point score. For a classroom setting, the way students think about class this example, there will be a standard class with and school in general could change. Concepts like homework, three exams and a final exam. Each

18 achievement the student obtains will be worth 1% extra Some of the appealing things about credit on their final grade, up to 10%. Meaning achievements in video games is they may ask you to obtaining achievements could improve a student's grade play the game in a completely different, unconventional an entire level, which could be very attractive to a way. For the classroom, an example here could be: student. Achievements should be difficult to obtain, to challenge the player or student, but always within Class Clown – 0%+ reach, in order to not discourage them. A professor The student made a relevant, tasteful joke that was does not want their student to become discouraged, just funny to the entire class. as a developer does not want their player to be discouraged. The following are example achievements Class Chef – 0%+ for the classroom in the form that one might see them in The student brought food for the entire class. a video game. While these have nothing to do with the class Perfect Attendance - +1% content, they can still add to the class by adding student The student has attended every class or provided recognition. Achievements give students something to documented reason for absence. strive for beyond a good grade and could add to the entirety of the learning process. Perfect Homework Grade – 1%+ The student has obtained a perfect score on a 2.2 Progress Meters homework assignment. A progress meter in a video game is a visual Every Homework Assignment Completed – 1%+ representation or gauge that tracks the player's progress The student has completed and turned in every through the game. The player can see a “75%” homework assignment. showing them that they are three quarters a way through the game, or three quarters done with the 80% or Better! - 1%+ content in the game. This could even be multiple The student has obtained “Every Homework meters within an overall meter. In a classroom, these Assignment Completed” and has also obtained a grade meters could be used to represent a grade, or the of 80% or better on the homework assignments. student's progress through a certain chapter in the text book, or all chapters. The example here could be: Perfect Test Grade – 1%+ The student has obtained a perfect test grade. Homework [X|X|X|X|_|_|_|_|_|_] - 40% Completed 5-Time Participation – 1%+ The student has participated 5 times in class discussion Lab before a certain date. 3/10 – 30% Completed

Class Example – 1%+ Chapter Reading The student has worked an example problem in class on 57% Completed the board. Overall Extra Credit Lv. 1 – 1%+ 83/100 - Current Possible Grade :: B- The student has completed X amount of extra credit problems; this can only be obtained 3 times. Having that visual knowledge of progress can keep a student interested and motivated. Another thing Extra Credit Lv. 2 – 1%+ some video games do with progress meters is to allow See above. the player to exceed past the maximum, giving a great sense of accomplishment. That example could be: Extra Credit Lv. 3 – 1%+ See above. Homework [Q|Q|X|X|X|X|X|X|X|X] – 120% Completed! These are merely examples, as the professor may feel that the student should do some of these with Lab out extra credit, but as a class requirement. 11/10 – 110% Completed

Chapter Reading

19 104% Completed other players and could motivate a player to attempt to improve a score to improve their rank. Some classes Overall already incorporate leaderboards by displaying a grade 113/100 - Current Possible Grade :: A+ ranking or assignment score ranking. Knowing their standing for every assignment, test, lab, and the overall Here, the maximum could be the minimum class keeps students motivated and competitive. In requirements to obtain a 100%. Exceeding past that video games, there are online leaderboards for almost could be done through extra credit, or achievements, everything in the game from highest score in every giving the student another goal to strive for while not level, to time cleared, fewest shots fired, etc. While taking away from the class. there is definitely a concern with grade privacy of students, something as simple as mentioning the 2.3 Unlockables student's rank in the class without seeing the others could continue to motivate the students. An unlockable in a game is an item, such as new game modes, characters, levels, music, or pictures, 2.5 Online Resources hidden from the player until they have accomplished a certain task in the game or received the key needed to With internet resources more accessible than ever make them available. While some may think at home and on campus, integrating classroom features, unlockables are a good thing as they could motivate the along with the concepts above, to an easy-to-use, secure player to find new things in a game, or allow new website would improve student-teacher interaction. content to appear over time, others think that everything The student could have access to scores, previous in the game should be available to the player from the homework, current assignments, moderated forums, start. While both sides have valid arguments, homework solutions, study guides, achievements, unlockables could be very beneficial in a classroom unlockables, and progress meters. If the content is well setting. It gives the student another form of motivation organized and visually friendly, students may access the to do well in class, or those looking for extra challenge, class website more frequently and be more involved in something to strive for. The following examples are the class. Developers also want this type of interaction what unlockables could be in the classroom setting: for their games as having the player interested in interacting outside of their game is very appealing for Extra Cheat Sheet improved sales. Professors could hope for improved Complete all homework assignments up to the test and grades and student interaction. At New Mexico State be allowed to bring another cheat sheet for the exam. University, the online Blackboard system has built in features that could possibly be used to accommodate Extra Credit Allowed these ideas. Conditional release of pages, moderated Complete all homework assignments to be allowed for forums, and customizable pages could allow more of extra credit. these concepts to be integrated easily.

Online Solutions 3. Applied Examples Attend 80% of the classes to be allowed to look at homework solutions at any time. For CS117, Computer Animation Using Alice, some of these techniques are already or have been in Lowest Homework Grade Dropped use. Currently, simple progress meters are being Complete all of the labs before finals week to have the implemented to show the students their progress lowest homework score removed from grading. through the course and give them the ability to score over 100%. The extra credit offered are extended These examples show clear goals and rewards for topics that require a little extra work outside of what is completing them. Unlike achievements, an unlockable given in class, but encourage things like good is something tangible and useful to the student. An programming practices or exploring other features that important thing to remember about unlockables, are useful but not essential to the project. Some extra however, is that they should not be essential to doing credit is even considered secret, much like secret well in the class. achievements in games. The extra credit is not worth much, but it still encourages the student to try new 2.4 Leaderboards things. As a result of trying these techniques, students move beyond project requirements and strive to A leaderboard in a game normally states a player's complete as much as possible. Through observation, rank, name, and score. This gives a comparison to

20 class attendance is also higher than it has been with specifically planned projects and closer integration with previous semesters. At the University of Indiana, Blackboard. Since CS117 is built around attendance Professor Lee Sheldon has incorporated another video and projects only, a retooling of the attendance grade to game idea of “leveling up”, a method in which a player a participation grade would give more flexibility in accomplishes tasks in the game to earn experience applying the concepts discussed. In addition, if time points, which are then applied to improve the player in permits and resources are available, these ideas will various ways. In Professor Sheldon's class, the student also be used during the all-female Young Women in gains experience points through attendance, high grades Computing Camp during the 2010 Alice Summer and participation. As a result, attendance in class is up Session. Observations will be made to see whether any and grades are higher than they were in previous differences in the effectiveness of this “video game classes. [2] method” of teaching can be found among men and women.

4. Conclusions References

1. B. Boyer NPD: 2007 U.S. Game Industry With a few of these concepts integrated, there has Growth Up 43% To $17.9 Billion 2008 been an improvement in class attendance and grades. classroom could bring even more improvement and 2. J. Schell DICE 2010: “Design Outside the enjoyment for the students. As with many things, more Box” work to be done. Thankfully, with the current 3. J. P. Gee What Video Games Have To Teach technology trend, a one time set up of these concepts Us About Learning and Literacy (ACM may be all a professor needs to implement the features Computers in Entertainment, Vol. 1, No. 1, discussed here. While some of these concepts are used Oct. 2003, Book01) in different ways, expanding them into something a 4. H. C. Arnseth Learning to Play or Playing to video game developer might use in their game could Learn – A Critical Account of the Models of Communication Informing Educational make classes more engaging, interactive, challenging Research on Computer Gameplay and fun for the students involved. studied in the CS117 course at NMSU through

21 Finding Patterns of Terrorist Groups in Iraq: A Knowledge Discovery Analysis

Steven Nieves Department of Electrical & Computer Engineering and Computer Science Polytechnic University of Puerto Rico Hato Rey, PR 00918, USA Email: [email protected]

1. Abstract increase our perception on their mode of operations. In this paper we explore the application of data Eventually, this will facilitate the discovery and mining to model terrorism activity in Iraq. To this end, development of significant targeted intervention we experiment with the use of data mining algorithms strategies. to support in the process of the identification of Data mining is a multidisciplinary domain that terrorism patterns. We applied data mining techniques includes efforts from areas of database technology, to real terrorism data from the Global Terrorism artificial intelligence, machine learning, neural Database (GTD) of the National Consortium for the networks, statistics, pattern recognition, knowledge Study of Terrorism and Responses to Terrorism base systems, high performance computing, and data (START). The data mining techniques used in this study visualization fields. As data mining has developed, it is discover inherent information of different terrorist commonly acknowledged to be a particular stage in a organizations according to the different types of larger process known as Knowledge Discovery. As terrorist acts they commit. The results in this paper shown in Figure 1, the term Knowledge Discovery should be practical not only for counter terrorism refers to the extensive process of finding knowledge in security analysts, but also to determine prioritization databases and it is focused on the actions leading to and geographical allocation of military and law tangible data analysis including the assessment and enforcement resources. presentation of results. Key Terms - Data Mining, Clustering, Classification, Association.

2. Introduction

Perhaps the most used definition of terrorism has been adopted by the United States State Department, Department of Defense, and Central Intelligence Agency. In their terms, terrorism is the premeditated, politically motivated violence perpetrated against non- combat targets intended to influence an audience [1]. Terrorism has expanded to the point where terrorist Figure 1 movements have become a significant influencing Phases of Knowledge Discovery Process factor in international politics [2]. For this reason, numerous efforts to study terrorist activity have emerged. This study explores the application of the Terrorism can take many forms, but each requires Knowledge Discovery process to model terrorism to be addressed through different approaches. An activity in Iraq. To this end, we experiment with the approach to address insurgency violence making use of use of data mining algorithms to support in the process heavy military artillery and airstrikes is not reasonable. of determining the prioritization and geographical As well as assigning police to undertake terrorist allocation of military and law enforcement resources. hideouts. In fact, different types of violence will The study brings out different terrorist groups that require different solutions, depending on their specific are capable of attacking U.S. military installations. The character. By analyzing the data of terrorist acts, main objectives of this study are as follows: (1) to identifying and classifying terrorist groups, we can identify terrorist groups with the capability of attacking

22 military targets; and (2) to identify the types of the number of clusters that will be generated is 6, weapons these terrorist groups are likely to use. conforming to the number of values for the dataset attribute ‘AttackType1_txt’ which represents the type 3. Methods of attack for each instance in the dataset. The result of the K-Means clustering operation in Figure 2 shows the dispersed terrorism acts in relation to the attack types. 3.1 Data Selection

The GTD database was provided by the START Hijacking Hostage program from the University of Maryland [6]. We Taking Infrastructure extracted a subset of the data that contains terrorism Attack records ranging from 1991 to 2007, for the purpose of Bombing/ Explosion using contemporary information. We also selected Armed those acts of terrorism documented for the country of Assault Assassination Iraq, eliminating unrelated instances belonging to other geographical locations and giving importance to only Cluster_5 Cluster_1 Cluster_4 Cluster_2 Cluster_0 Cluster_3 the geographic area under study for this paper. Figure 2 3.2 Data Preprocessing K-Means Clustering using RapidMiner

We eliminated irrelevant attributes that would have not added significance in the analysis processes. The 3.4.2 OneR attributes maintained for the dataset were the date and city location of the incident, the type of weapons used This next step will call the clusters individually and to commit the terrorist act, the number of casualties, the the OneR classifier is applied. The algorithm generates amount of wounded victims, the type of attack and the one rule for every value of the attribute and selects the identified terrorist group responsible. The dataset best rules. For this step we used the WEKA data instances with missing information were also mining tool using as input the results of the previous eliminated from the database. This step resulted in a clustering operation and applied individually to each of reduced number of 189 clean instances in the dataset. the cluster’s subset. The goal of this step is to find rules that identify terrorist groups and target type within 3.3 Data Transformation the clusters. Table 1 shows the generated rules using OneR in cluster_0. We then selected only the terrorist The GTD was imported into a Microsoft Access groups that contain the military target type, as shown in database for the ease of generating queries. After Table 2. completing the data preprocessing phase, the resulting instances were exported from our local database and converted to comma delimited format. The comma delimited format allows the used data mining tools to read directly into the resulting files without additional formatting.

3.4 Data Mining

In the data mining phase we analyze the data using a set of algorithms. In this approach we applied the K- means algorithm for our clustering technique [3]. For classification learning, the OneRule algorithm will be used [4]. And finally, we put together the Apriori algorithm for generating association rules [5].

3.4.1 K-means

The dataset was fed into the RapidMiner tool for performing clustering, the first data mining technique applied. Next the k-means operator is added to the project and the k parameter is assigned. For this case,

23 Table 1 Results of OneR using Weka 3.4.3 Apriori

Terrorist Group Target Private Citizens & In this step of our analysis we will initiate a Mujahedeen Brigades Property through review of each cluster by applying Apriori to Tawhid and Jihad Maritime generate association rules. By default, APRIORI tries to generate ten rules. It begins with a minimum support of Jaish al-Ta Business Private Citizens & 100% of the data items and decreases this in steps of Al-Qa`ida Property 5% until there are at least ten rules with the required Government minimum confidence, or until the support has reached a Ansar al-Sunna (General) lower bound of 10%, whichever occurs first. Also, the Takfir wal-Hijra (Excommunication and Exodus) Military minimum confidence is set to 0.9. The goal of this step is use different attributes to complement information Brigades of Iman Hassan-al-Basri Journalists & Media gained in the previous step. We applied this step only to Factions of the Mujahedeen Army Other clusters where military target types were found. As an Ansar al-Tahwid wal Sunna Business example, Table 2 presents association rules results for Mujahedeen Shura Council Police cluster_0 using Apriori with the Weka tool. Islamic Army in Iraq (al-Jaish al- Islami fi al-Iraq) Military Anbar Salvation Council Terrorists Ansar al-Islam Police Islamic State of Iraq (ISI) Police Diyala Salvation Council Terrorists Kurdistan Free Life Party Military Kurdish Democratic Party-Iraq (KDP) Military

Table 2 Association Rules

1. Terrorist Group=Al-Qa`ida Weapon Type=Explosives/Bombs/Dynamite 17 ==> Attack Type=Bombing/Explosion 17 conf:(1) 2. Target Type=Private Citizens & Property Weapon Type=Explosives/Bombs/Dynamite 7 ==> Attack Type=Bombing/Explosion 7 conf:(1) 3. Attack Type=Bombing/Explosion Target Type=Private Citizens & Property 7 ==> Weapon Type=Explosives/Bombs/Dynamite 7 conf:(1)

4. Target Type=Police Weapon Type=Explosives/Bombs/Dynamite 7 ==> Attack Type=Bombing/Explosion 7 conf:(1)

5. Attack Type=Bombing/Explosion Target Type=Police 7 ==> Weapon Type=Explosives/Bombs/Dynamite 7 conf:(1)

6. Target Type=Military Weapon Type=Explosives/Bombs/Dynamite 7 ==> Attack Type=Bombing/Explosion 7 conf:(1) 7. Target Type=Military Terrorist Group=Al-Qa`ida Weapon Type=Explosives/Bombs/Dynamite 6 ==> Attack Type=Bombing/Explosion 6 conf:(1)

8. Weapon Type=Explosives/Bombs/Dynamite 27 ==> Attack Type=Bombing/Explosion 26 conf:(0.96)

9. Attack Type=Bombing/Explosion 27 ==> Weapon Type=Explosives/Bombs/Dynamite 26 conf:(0.96) 10. Attack Type=Bombing/Explosion Terrorist Group=Al-Qa`ida 18 ==> Weapon Type=Explosives/Bombs/Dynamite 17 conf:(0.94)

24 3. Results information necessary for describing the selected terrorist groups. The proposed analysis structure is used along with various data mining techniques and tools. The K- Table 3 means clustering technique grouped the dataset with Association Rules possible attack patterns and served as a foundation for Rule organizing the data. Figure 4 shows the distributed data Target Type=Military Weapon Type=Explosives/Bombs/Dynamite 7 ==> Attack assigned to each cluster. Type=Bombing/Explosion 7 conf:(1)

Target Type=Military 2 ==> Attack Type=Bombing/Explosion 2 conf:(1) Attack Type=Assassination 15 ==> Weapon Type=Firearms 15 conf:(1) Weapon Type=Explosives/Bombs/Dynamite 40 ==> Attack Type=Bombing/Explosion 38 conf:(0.95)

4. Conclusions

Armed with the results of each of our data mining techniques, we can raise our insights on terrorists operations and provide security analysts with the Figure 4 information needed to put into action intervention Terrorist Acts by Clusters strategies. Perhaps, the most interesting observation is For each cluster, we implemented the OneR that terrorist groups are classified in terms of the reasons for their actions, ethnic beliefs and political classification technique which in return generated one views, but not for their actions themselves. As a attack target for each terrorist group within the clusters. recommendation, counter-terrorist and law enforcement The results in Table 2 present terrorist groups with agencies should use all available means to establish a military targets capability and the membership cluster. method that enables reclassification of terrorist groups based on collected data and ongoing analysis. This Table 2 approach can greatly contribute in the effort of Terrorist Groups Targeting Military prioritizing counter-strikes without underestimating terrorist group capabilities. Terrorist Group Target Cluster Takfir wal-Hijra (Excommunication and Exodus) Military cluster_0 5. References Islamic Army in Iraq (al-Jaish al-Islami fi al-Iraq) Military cluster_0 Kurdistan Free Life Party Military cluster_0 1. C.E. Stout, The Psychology of Terrorism: Kurdish Democratic Party-Iraq Theoretical understandings and perspectives, (KDP) Military cluster_0 Greenwood Publishing Group, 2002. Ansar al-Sunna Military cluster_2 2. B. Lia, Globalisation and the Future of Terrorism, New York NY: Routledge, 2005. Al-Qa`ida Military cluster_5 3. K. Fukunaga, Introduction to statistical pattern recognition, Boston: Academic Press, 1990. Using the Apriori algorithm we generated 4. R.C. Holte, “Very Simple Classification Rules association rules. The association rules can serve as Perform Well on Most Commonly Used Datasets,” complement for determining attack type and weapon Machine Learning, vol. 11, 1993, pp. 63-90. type the identified terrorist groups are most likely to use 5. R. Agrawal and R. Srikant, “Fast Algorithms for in an attack, based on the information of historic data. Mining Association Rules in Large Databases,” The rules that are redundant and irrelevant have been Proceedings of the 20th International Conference discarded in order to maintain the most coherent. on Very Large Data Bases, Morgan Kaufmann Therefore an analyst intervention is required to inspect Publishers Inc., 1994, pp. 487-499. the association rules that will give additional 6. Global Terrorism Database, START: CD-ROM.

25 Morphological Feature Extraction for Remote Sensing

José G. Lagares ([email protected]) Master in Computer Science Dr. Alfredo Cruz Electrical & Computer Engineering and Computer Science Polytechnic University of Puerto Rico Abstract  Feature based image registration mention a few. An automatic, fast and accurate consist of four steps : feature detection, image registration system is an essential step in feature matching, transform model estimation, and order to make this data usable. This work presents a image resampling & transformation. Many feature software system capable of automatically extracting based systems still rely on manual extraction of features from satellite images. For this purpose, we control points. The weakness of a manual process, rely on various techniques including mathematical had made automatic detection of features a more morphology [3], which allows the computations for suitable alternative. In this work, we describe a creating morphological profiles, that will point to system that rely on algorithms for automatically the most relevant features on image data. finding specific landmarks or features on remote sensed data. This work was based on the use of mathematical morphology, for enhancing feature- like patterns, along with Scale Orientation Morphological Profiles for selecting the points which represented the stronger set of features in this kind of data. Using erosion, dilatation, opening and close operations, to build SOMP's, it was possible to implement a software system capable of achieving sub-pixel accuracy.

1.INTRODUCTION Figure 1 Image registration is the process of aligning Remote Sensing Satellites images from different sources, taken at different times, or from different points [1]. One of its most 2.METHODS important goals is to find commonalities between these images, in order to find a transformation that 2.1 Sources of distortion will put them in the same coordinate system. The need for image registration come from Usually, the selection of the suitable method for differences (distortions) between image data of the registration, is based on the context of the required same content, object or scene. Once the source of task at hand [2]. In this work we will focus on distortion is know, an appropriate approach could image registration methods for processing remote be selected for realigning the images. Specifically sensed data, captured by satellites (Figure 1). distortions which are the cause of misalignment[4], Remote sensing is the small or large scale those which will define the class of transformation acquisition of information of an object, by sensing that will align the images. The source of distortion devices that are not physically in contact with the may be differences in the acquisition process [4]: object by the use of recording or real-time sensing • different views: Scene captured from different devices. Satellites collect different spectrum of viewpoints, caused from variations in orbits, earth data, such as meteorological and geographical, to rotation, platform movement, etc.

26 • different times: Same scenes images are taken Erosion at different times, distortions may be caused by For erosion, In gray-scale morphology the different atmospheric conditions, sun position, or structuring element is tested over all pixels of the geographical changes. In many cases these sources image. It selects the pixel with the lowest value of distortions are not the cause of misalignment and from those that fits the structuring element and we have to find suitable methods that take these in replaces the original pixel with that value. The consideration, without removing them. formula for erosion is described in (1). • different devices: Different instruments acquire different kind of information from the same scene. f⊖ B x,y =Min 2 f x+s,y+t , (1) In these cases, our image registration technique      s,t  ∈Z  B    needs to have an a priori information, to handle the 2 2 distortions to be found, and correctly relate just the Where  x,y  ∈Z and Z  B  denotes the set common elements of the images, in order to of discrete partial coordinates associated with perform and accurate registration. pixels lying around the neighborhood defined by B.

2.2 Image Registration Steps Dilatation

In [5] the four basic feature-based image The dilatation operation, tests every pixel registration steps, are clearly identified: against the structuring element, and includes the • Feature detection: Involves the identification of maximum value of the ones that fits on the distinctive objects or salient characteristics across structuring element. Using the same notation used the image. Objects such as lines, edges, contours, for erosion, dilatation could be computed with the line intersections, corners, and regions, can be following equation (2). identified as features. • Feature matching: Compute and establishing the  f⊕B  x,y  =Max 2 f  x−s,y −t  , (2) correspondence between the features detected in the  s,t  ∈Z  B sensed image, and those in the reference image. • Transform model estimation: The type and Opening parameters of the mapping functions that aligns The opening operation works, by first both images are estimated. performing an erosion on the pixel data, and then • Image resampling and transformation: The processing the output by a dilation operation. The sensed image is transformed by means of the formula for opening is described below (3). mapping functions.

This work is focused on the feature detection  f▫B   x,y  =  f⊖ B ⊕B   x,y  , step. Feature matching, transform estimation, and (3) image resampling are out of the scope of this work.

2.3 Feature Detection with Closing Mathematical Morphology The closing operator works in the opposite way Mathematical Morphology is a technique based of opening. It performs a dilatation operation, and on set theory for the analysis and processing of then an erosion, on all the pixels of the image. The geometrical structures[6]. It is used by applying formula for opening is shown next (4). operations on two sets: one the image data, and the f▪B x,y f⊕B B x,y , other, the structuring element. The structuring     =   ⊖    (4) element acts as a filter for enhancing or suppressing image structures.

27 With the structuring element acting as a filter, brightest being the highest, fading into the lowest mathematical morphology operations allowed to scores of the top 50. control what characteristics of an image and how much of those could be enhanced, diminished or removed. This functionality make it suitable for the application of feature detection when processing remote sensed data. With domain knowledge, and correct selection of combinations of structuring elements, it can be used to specifically detect edges, bridges crossings, rivers, mountains, islands, and many other characteristics or important unique features of the images.

2.4 Scale Orientation Morphological Profiles Figure 2. Using the morphological operations, and LANDSAT 7 N-13-25_2000 processed by SOMP's and structuring elements of increasing sizes and entropy, the squares are the detected features. orientations, all pixels of the input images are Notice how all the selections were made into or processed. Then, computing the distances from the very close to the path of the mountain area or other original pixels to the processed ones provides the important characteristics of the image, like the data necessary to build Scale Orientation selected river. None of the non-unique parts (like Morphological Profiles[7] for each image pixel. plains) were selected. This was the expected and Using self information (Entropy), arrays were desired behavior, since we don't want ambiguity in computed storing the entropy values for each image the selected feature sets. pixel location. Selecting the highest scores from the Next, in Figure 3, the selected set for image array provides for the points with the highest LANSAT 7 N-38-10_2000 is shown. In this case, chance of representing one of the most important or many of the selected features represents edges, or strongest features of the images. These points could small islands, these were all automatically detected represent the basis for extracting chips (small from the image. Figure 4 shows the output after windows) to be registered against reference data. processing an image from RADARSAT-2.

3.RESULTS

In this work, we had successfully implemented a software system, capable of processing satellite data, and automatically finding the most important characteristics or features in them. In this section we will show the output of our system when processing two image data sets from NASA's Landsat satellite program. Figure 2 shows the output of our system after processing LANDSAT 7_N-13-25_2000 image with all the operations. The squares represent the points with highest entropy scores, and the brightness of the squares represents Figure 3 the order of the scores within this group. The Highest entropy selections for LANSAT 7 N-38-10_2000

28 4.CONCLUSION

Mathematical morphology and SOMP's had shown to provide a good mean for performing feature selection for remote sensed data. We were able to obtain fairly good results, specially in terms of accuracy. While commonly, in remote sensing

Figure 4 applications, rotation distortions could be a few RADARSAT-2 Standard Quad-POL selections degrees, we tested very drastic rotations obtaining very precise results. Given it's efficiency, and Here, for demonstrative purposes, many reliable output, validated by our software selected points are rendered. In a registration implementation, we understand it can be easily system implementation, only the top 10 points integrated with an image registration application could be selected for extracting the windows/chips such as NASA's TARA, or any other image to be paired to the input/reference data. registration system that needs to achieve sub-pixel With the goal of showing the accuracy of the accuracy. system we now we present an extreme example.

Normally the misalignment distortion in remote ACKNOWLEDGES sensed data would be no more than few degrees, but here we tested it with LANSAT 7 N-38- This work was supported in part by the NSF 10_2000 image, rotated 90 degrees. Figure 5 Show Grant CNS- 0837556 and Alfredo Cruz, PhD, the output after processing such image. Computer Science program coordinator for the Polytechnic University of Puerto Rico.

REFERENCES

[1] D . Capel , “Image mosaicing and super-resolution” Distinguished Dissertations, Springer, New York, 1974.

[2] X. Jia, J. A. Richards, and D. E. Ricken, Remote Sensing Digital Image Analysis: An Introduction, 4th ed., Springer- Verlag, New York, 1999.

[3]A. Plaza, J. Le Moigne and N. S. Netanyahu, “Automated Image Registration Using Morphological Region of Interest Feature Extraction”, IEEE Intl. Workshop on Analysis of Multi- Temporal Images, Biloxi, Mississippi, pp. 99-103, 2005.

[4] L. G. Brown, “A Survey of Image Registration Techniques” Figure 5 ACM, New York, January 12, 1992. Selections for 90 degree rotated LANSAT 7 N-38-10_2000 [5] B. Zitova , J. Flusser, “Image Registration Methods : A survey”, Department of Image Processing, Institute of A 90 degree rotation is a very drastic example, Information Theory and Automation, Academy of Sciences of the very far from what is commonly found on the Czech Republic, June, 2003. remote sensing context, but since one of the goals [6] J. Serra, “Image Analysis and Mathematical Morphology”, was achieving sub-pixel accuracy, this example Academic Press, New York, 1982. represents a very strong point towards proofing the [7] A. Plaza, J. Le Moigne, and N. S. Netanyahu, accuracy of the implemented system. Most of the “Morphological Feature Extraction for Automatic Registration selections the software made, on both the original of Multispectral Images”, Geoscience and Remote Sensing image, and the rotated one, are very closely Symposium, IGARSS 2007. IEEE International, pp. 421 – 424, 2007. matched.

29 Unsupervised Clustering of Verbs Based on the Tsallis-Torres Entropy Text Analyzing Formula

Gina Colón Electrical & Computer Engineering and Computer Science Department Polytechnic University of Puerto Rico, 377 Ponce de León Ave. Hato Rey, PR 00919 Email: [email protected]

Abstract empowers computers to be able to process information from multiple sources to minimize ambiguity [4]. This work proposes a method of clustering terms in One particular data mining technique frequently Wordnet based on the Tsallis-Torres entropy measure used in unsupervised learning is clustering. It attempts for analyzing text variety. This study serves to to group similar items together within the same cluster determine whether this entropy measure serves as an and dissimilar objects in other clusters. efficient measure to cluster, in an unsupervised In clustering, the distance between items in each manner, verbs from Wordnet. The testing consists of cluster can be calculated using different measures, such evaluating the entropy level of combinations of n as entropy. Borrowed from thermodynamics, entropy number of verbs, using the number and frequency of the evaluates similarity or dissimilarity among items in a words in each selected verbs' definitions. given set [7]. Entropy is widely used in data mining and computer science in general, because the measure is 1. Introduction tolerant to incomplete, noisy or corrupted data [2]. The purpose of this paper is to present an As the availability of data in enormous volumes unsupervised method for processing Wordnet or other has dramatically grown in recent years [3], the need to lexical database using a normalized entropy measure derive useful information from this data has become developed by Torres [7], based on Tsallis's work [8], to imperative. Traditional methods of extracting organize clusters of similar terms using the terms' information through data mining (e.g. tabulation, definitions. supervised learning, etc.), which primarily rely on To conduct this experiment, the Wordnet lexical manual work of analysis and interpretation, have database was used. It is a system inspired by become impractical. They are time-consuming, very psycholinguistic theories, and commonly used for expensive, highly subjective, prone to errors and natural language processing experiments. inconsistencies [1]. Wordnet differentiates from other dictionaries in Most available data is in an unstructured form. that it only contains four types of parts of speech or That is, it is in the form of human written or spoken POSs: nouns, verbs, adjectives and adverbs. According language, also known as natural language. Language is to Miller [5], all other words in the are assumed to be rich in patterns, and also in exceptions. It is inherently function words and are omitted from Wordnet. ambiguous as it has innumerable degrees of variations, This research is part of an ongoing thesis and is dependent upon many different factors, such as investigation. geographical region and domain of use [4]. As written and spoken language evolve over time, 2. Methodology accommodating different uses or applications, automatic acquisition of lexical knowledge by a 2.1 Data Preparation computer plays a critical role in the knowledge discovery process. Crucial to this process, is enabling The data set for this experiment contains all the computers to organize and interpret data in such a way terms in Wordnet, along with their corresponding that it allows them to process language on their own definitions. These were indexed into a relational [6]. Automatic or unsupervised language learning

30 database, where each term was assigned a unique where pi are the frequencies for each word in the text, number, to serve as identifier. or the times each word is found on the text of λ length. The definitions of each verb were subject to the And the maximum entropy Emax is the equivalent of data preparation process depicted in Figure 1, and each word occurring once in the text. It is given by: n explained in detail below: 1 E = 1 log  λ −log p =log  λ  (1) All non-word characters are removed. Any max λ ∑ [ 10 10 i] 10 character other than letters are eliminated. This process i=1 is also known as linearization. As seen from these formulas, the relative entropy measure Erel presented by Torres is sensitive to word (2) Words in the definitions are stemmed into frequencies, and provides a normalized measure that their base forms: past tense and and 3rd person singular can be used to compare clusters. forms converted to their present base form, and plural The unsupervised algorithm presented in this paper nouns into singular forms. uses the relative entropy measure Erel to determine the (3) The verb itself, if contained in its own affinity of different verbs to form semantically related definition, was eliminated from its definition to avoid clusters. The clustering process is shown in pseudo redundancy. code in Listing 1, and explained in detail below. (4) Text is converted to lower casing. Listing 1. Clustering Algorithm Pseudo Code (5) Each word from each definition is converted to its corresponding index number or unique identifier function first_cycle for i=0 to num_terms generated from the relational database. for j=1 to num_terms (6) Any remaining words from each definition that Erel = calc_entropy(terms[i], terms[j]) if Erel < entropy_threshold were not converted to their unique identifier were entropy_threshold = Erel eliminated, as they are the function words that are not end if end for part of the Wordnet database [5]. end for Figure 1. Data Preparation Process end function function next_cycles while(num_clusters > 1 or num_clusters >= desired_num_clusters) Linearization Stemming Eliminate for i=0 to num_stored_entries redundancy for j=0 to num_entries if (stored_entry[i] != terms[j]) Erel = calc_entropy(stored_entry[i], entry_[j]) store_merged_cluster end if Eliminate end for Conversion Lower function end for to uniqueIDs casing num_clusters = count_num_merged_clusters words end while end function

function find_clusters 2.2 Algorithm Description first_cycle() next_cycles() end function The experiment involves combining the definitions of the different terms and determining its entropy level In its first cycle, the evaluation process starts by to determine which arrangements are optimal for combining the definitions of pairs of verbs to calculate clustering. For this purpose, each combination is Erel. In other words, the cycle combines the words in the subjected to word variety analysis based on the relative definition of each verb with the words in the definition entropy text analyzer formula given by Torres [7]: of the other verbs, evaluating all possible unique pairs. This initial phase initializes the entropy threshold ET Erel = ×100, index equal to the lowest entropy index. For the next Emax cycles, the entropy threshold marks the boundary for where Erel is a normalized measure derived from the selecting combinations with tolerable entropy indexes. division of total entropy ET by the total entropy Emax, The second phase of the process iteratively and converted to a percentage. combines clusters of 3 to n number of terms, until the The total entropy is given by the formula: selected number of clusters has been reached or just one n 1 cluster remains. For each iteration, only those ET = p1 ,...,pn= ∑ pi [ log10  λ −log10 pi] , combinations which have a lower entropy index than λ i=1 the threshold are kept for the next iteration. With each

31 cycle, the entropy threshold is re-evaluated and set to Since the ordering of the words has no relevance the lowest entropy index of each cycle. for the calculation of the entropy index, any color patterns on Figure 3, are merely formed by chance. 3. Preliminary Results Figure 3. Results from First Cycle To demonstrate the procedure, a sample of 13 verbs was taken from the Wordnet database, where 10 of these verbs are semantically related, the other 3 are not. Table 1 lists the selected verbs. For each of these verbs, the definitions were turned into feature vectors using the unique numbers generated from the indexing of Wordnet into the relational database. Table 1. Selection of Verbs NUMBER VERB 1 run After the second cycle, the algorithm selected only 2 move one cluster which had a lower entropy index than the 3 go entropy threshold. This cluster included the verbs: go, 4 travel travel and move. 5 continue The results obtained from this preliminary 6 walk experiment yielded tentative positive results. With only 7 stroll two cycles, the algorithm was able to identify the best 8 crawl 9 climb candidate cluster of synonyms. 10 jog 11 compensate 4. Future Work 12 think 13 study Future work will focus on applying stemming algorithms to the words in the definitions, in order to A sample of how the verbs' definitions are minimize word variability. It will also focus on converted to into their resulting feature vectors after addressing parallelization as an alternative to Step 6 of the data preparation phase is shown in Figure optimizing the data processing and clustering 2. procedures. Further testing will involve applying the entropy Figure 2. Sample of Feature Vector Conversion clustering algorithm presented here to all the terms in the Wordnet database. The resulting clusters will be compared against the manually generated semantical hierarchies of the Wordnet database. Additional considerations may involve evaluating whether to include the remaining parts of speech not included in the Wordnet database as part of the feature vectors fed to the algorithm. Figure 3 compares the entropy indexes which Finally, future work may also consider assigning resulted from the first cycle. In this first cycle, the weights to enhance or diminish the semantic or words in the verbs' definitions where combined to form syntactic relevance of the words in the definitions. unique clusters of two verbs. Here, the x and y axis represents the combinations 5. Conclusions of the 13 verbs listed on Table 1. Bluish colored squares represent those combinations with the lowest Previous work on entropy shows that entropy can entropy indexes, such as the combination of travel be successfully used as a classifier and for clustering. (verb #4) and move (verb #2). This work covers an alternative method of using Inversely, the yellowish squares are least likely to entropy measures in an unsupervised clustering be paired because they yielded the highest entropy algorithm to identify clusters within the Wordnet indexes, such as with stroll (verb #7) and crawl (verb database. #8). The work also presents the methods and processes used to transform the Wordnet database definitions into

32 suitable feature vectors, that can be subjected to the 3. Lyman, P., Varian, H. R., Charles, P., Good, N., unsupervised entropy clustering algorithm presented in Jordan, L. L., & Pal, J. (2003). How much this paper. information? 2003. Retrieved December 27, A small sample of verbs was chosen to 2009 from demonstrate the clustering algorithm procedure, and http://www2.sims.berkeley.edu/research/projects/h these preliminary results show how the algorithm, in ow-much-info-2003/. principle, was able to identify a suitable cluster of three semantically related verbs. 4. Matsumoto, Y. (2003). Lexical knowledge Future work will focus on expanding the data acquisition. The Oxford Handbook of preparation process and testing on a full scale set from Computational Linguistics. New York, NY: Oxford the Wordnet lexical database. University Press.

Acknowledgements 5. Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Introduction to Wordnet: Publication of this work has been possible, in part, An on-line lexical database. International Journal by the support of the NSF Grant No. CNS-0837556. of Lexicography, 3, 235#244. The author would like to thank professors Kay Berkling and Eliana Valenzuela, for their mentorship and 6. Schank, R., & Kass, A. (1986). Natural language support. processing: What's really involved? New Haven, CT. References 7. Torres, D. F. M. (2002, October). Entropy text 1. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. analyzer. 2nd Portuguese National ACM (1996). From data mining to knowledge discovery Programming Contest. Lisboa, Portugal. in databases. AI Magazine, 17, 37#54. 8. Tsallis, C. (2002). Nonextensive statistical 2. Halperin, E., & Karp, R. M. (2005). The minimum- mechanics: A brief review of its present status. entropy set cover problem. Theoretical Computer Anais da Academia Brasileira de Ciências, 74 (3), Science (348), 240#250. 393-414.

33 Document Classification using Hierarchical Temporal Memory

Roxana Aparicio and Edgar Acuña Computing and Information Sciences and Engineering University of Puerto Rico, Mayaguez Campus Emails: [email protected], [email protected]

Abstract 2. Hierarchical Temporal Memory

In this paper we explore the use of hierarchical 2.1 The memory - Prediction Framework temporal memory for document classification. We propose how can we set up a text classification From a computational point of view, the theory problem in order to be suitable for HTM technique. proposed by Hawkins [3], can be summarized as Also, we justify why HTM could work in document follows [2]: classification. The final goal is to apply this HTM technology in a) The part of the brain named neocortex builds a the context of semi-supervised classification. model for the spatial and temporal patterns that it is exposed to. The goal of this model construction is 1. Introduction the prediction of the next pattern on the input. b) The cortex is constructed by replicating a basic Hierarchical Temporal Memory (HTM) is a computational unit, which can be treated as a node machine learning technique based on the memory- that is replicated several times. prediction theory of brain function (see [3] for more c) The cortex is organized as a hierarchy. details). HTM models some of the structural and d) The cortex models the world by memorizing algorithmic properties of the neocortex [4] and it is patterns and sequences at every node of the applicable to a broad class of problems from machine hierarchy. This model is then used to make vision, to fraud detection, to semantic analysis of text predictions. [2]. e) The neocortex builds its model of the world in an unsupervised manner. The novelty in this technique is that it takes into f) Each node in the hierarchy stores a large number of account, in a large extent, new discoveries about brain patterns and sequences. function, which can be applied to data modeling, g) The output of a node is in terms of the sequences of learning from data and prediction. patterns it has learned. h) Information is passed up and down in the hierarchy Dileep [2] introduces a mathematical model for to recognize and disambiguate information and cortical microcircuits based on the theory of Bayesian propagated forward in time to predict the next belief propagation in Hierarchical Temporal Memory input pattern. networks. 2.2 Overview of HTM In this paper, we set up the Hierarchical Temporal Memory technique proposed by Dileep to the problem HTMs are organized as a tree-shaped hierarchy of of document classification, having into account the nodes, where each node implements a common learning particular characteristics of written text. In our and memory function. approach the order of appearance of words within a document give us the temporal continuity needed to HTMs store information throughout the hierarchy model the problem. in a way that models the world. All objects in the world, be they cars, people, buildings, speech, or the flow of information across a computer network, have

34 structure. This structure is hierarchical in both space simulates transformations by continuous motions in the and time. HTM memory is also hierarchical in both training data set, creating movies out of binary images. space and time, and therefore can efficiently capture and model the real world [2]. In the context of document classification we will use the natural sequence of appearance of words within HTMs are very similar to Bayesian Networks. The a document as the temporal component of the model. difference between HTM and Bayesian Networks is in the way that time, hierarchy, action, and attention are 3.2 HTM Structure for document classification used [6]. The HTM network we will use to model document We can think of an HTM as a memory system. The classification is organized as follows: nodes in the HTM network are the basic algorithm and memory modules of the network. All nodes store Each document of the training set is scanned. information, that is, they are memory modules and Nodes at level 1 receive its input from a portion of a contain similar algorithms. These learning algorithms document. This portion could be a line or a predefined abstract some characteristics about the input space and number of words read in order, as shown in figure 1. A store them in the nodes memory [2]. node at level two receives its inputs from four level-1 nodes. 3. HTM for Document classification The effective input area from which a node Traditional document classification generally takes receives its input is called its receptive field [2]. In this as its input a set of vectors representing documents. kind of hierarchical arrangement, the size of the Each vector representing a document in this space will receptive field of a node increases as we go up in the have a component for each word. The value of this a hierarchy. The single node at the top of the tree covers component may depend on the frequency of the word in the entire document by pooling inputs from all of the the document and in the whole document collection. lower nodes.

Vector-based representations are sometimes Nodes at level 3 referred to as a bag of words, emphasizing that document vectors are invariant with respect to term permutations, since the original word order is clearly lost [1]. Nodes at level 2

HTM is suitable for problems where the data to be modeled are generated by a hierarchy of causes that Nodes at level 1 change over time. The problem should have both a spatial and a temporal component [2].

Documents have hierarchical structure in the sense that they are composed by words, phrases, sentences and paragraphs. They also have semantic features such as words, terms, and concepts, which appear in certain sequence within the document. Having into account this temporal nature of appearances of words within a document, and its hierarchical structure, we believe that this problem is suitable for the use of HTM.

3.1 Training Data Figure 1. Structure of a HTM Network for document classification. Arrows represent the information flow in We have stated before that HTM requires capturing the network. Nodes at level 1 receives its input from a the temporal characteristic of the real world. In order to portion of a document. Nodes at a certain level other adapt the document classification problem to the HTM than 1 receives its input from a portion of nodes at the model we make an analogy with the image recognition immediate lower level. problem described in Dileep [2]. In there, the author

35 The network in figure 1 is fed up with portions of documents. The receptive field of a level-1 node is a 5 3 line from the input document. w1

3.3 Extracting information from documents 8 w4 The node has two phases of operations. During the w2 learning phase, the node builds internal representations of the input patterns. A node that has finished its w3 5 learning process is in the sensing/inference phase. 5 During sensing/inference, the node produces an output for every input pattern [2]. 10 w6

The learning and sensing phases for the document Figure 2. Structure of a node. Markov graph learned at classification problem will be as follows. some stage of the learning process.

3.3.1 Learning Phase 3. Temporal grouping

For every input pattern, the node does three The node uses a clustering method to form the operations [2]: temporal groups. The partitioning is done such that the vertices of the same temporal group are highly likely to 1. Memorization of patterns follow one another [2].

The memory of a node stores words from its The probability of transition between two words is receptive field. Every word that is read from the input used as a measure of the similarity between those words field of a node is compared against words that are for the clustering algorithm. already stored in the memory. If an identical word is not in the memory, then this word is added to the 3.3.2 Sensing/Inference Phase memory. Once the node has completed its learning process 2. Learning transition probabilities can be used for sensing/inference. A node in sensing/inference produces an output for every input The node constructs and maintains a Markov pattern. However, it is possible for the node to use a graph. The link between two vertices is used to sequence of patterns to produce an output. [2] represent the number of transition events between the patterns corresponding to those vertices [2]. For every input pattern, the node produces an output vector that indicates the degree of membership In the context of document classification, the of the input pattern in each of its temporal groups. This vertices of this Markov graph correspond to the stored membership is to be determined using some distance words. When input word i is followed by input word j metric between stored patterns and the input pattern. for the first time, a link is introduced between the The larger the distance, the smaller the match between vertices i and j and the number of transition events on the input pattern and the stored pattern. [2] that link is set to one. The number of transition counts on the link from i to j is then incremented whenever a Calculating this for every stored pattern gives the transition from word i to word j is observed at the input closeness of the current input pattern to all the vertices of the node (see figure 2). of the Markov graph. Degree of membership of the input pattern in each temporal group is determined by The Markov graph can be normalized simply by the maximum of its closeness to each of the vertices dividing the number of transition events on the within the temporal group. This results in a vector of outgoing links of each vertex by the total number of length equal to the number of temporal groups, with transition events from that vertex. each component of the vector indicating the degree of membership of the input pattern in the corresponding temporal group. This vector is then normalized to sum to unity.

36 The normalized memberships become estimates of probability of membership in each temporal group. The normalized degree of memberships is the output of the node.

The final output is an estimation of the probability of membership of the current input in each temporal group of the node, which at the top level corresponds to the class of the document.

4. Ongoing Research

At this moment we are conducting research in using the model for semi-supervised document classification.

Acknowledgments

This work was supported by the department of Computing and Information Sciences and Engineering of the University of Puerto Rico and CAHSI with NSF Grant No. CNS- 0837556.

References

1. Baldi Pierre, Frasconi Paolo, Smyth Padhraic. Modeling the Internet and the Web: Probabilistic Methods and Algorithms. John Wiley, 2003. 2. Dileep George. How the brain might work: A hierarchical and temporal model for learning and Recognition. Doctoral Dissertation. 2008. 3. Hawkins Jeff and Blakeslee Sandra. On Intelligence. Henry Holt and Company, New York, 2004. 4. Hawkins Jeff and Dileep George. “Hierarchical Temporal Memory – Concepts, Theory, and Terminology”, Numenta, Inc. 2006. http://www.numenta.com/Numenta_HTM_Concep ts.pdf. 5. “Problems That Fit”, Numenta, Inc., 2007. http://www.numenta.com/fordevelopers/education/ ProblemsThatFitHTMs.pdf. 6. “Hierarchical Temporal Memory – Comparison with Existing Models”, Numenta, Inc.,2007 http://www.numenta.com/fordevelopers/education/ HTM_Comparison.pdf.

37 Low Power Software Techniques for Embedded Real Time Operating Systems

Daniel Mera and Nayda Santiago Electrical and Computer Engineering Department University of Puerto Rico, Mayagüez Campus Mayagüez, PR 00681 Emails: [email protected], [email protected]

Abstract- Power consumption is an important constraint power techniques level by level. When an RTOS runs in embedded system running real time operating system on an embedded system, optimizations at the operating (RTOS) [6]. Many approaches have been developed to system level have been proposed to reduce power reduce the power consumption in embedded system at consumption such as I/O management, memory the hardware level, at the operating system level, and at management schemes to make an energy efficient the higher levels of abstraction [13]. This study memory allocation [2], dynamic power management proposes an evaluation of significance of the joint effect techniques (DPM) to dynamically scale the voltage and of all possible factors of interest and their interaction in frequency on the hardware, low power modes, the reduction of power consumption of RTOS running scheduling algorithms [9], and energy characterization on small and medium scale embedded systems. Design of embedded RTOS [1, 6]. Our work is different at of experiments techniques (DOE) were used to identify found in the literature [6] because we evaluate the the impact in the power consumption of the system; this effect of changing all possible factors of interest: will allow obtain more sound results in order to platforms, benchmarks, operating system level generalize about the effect of the operating system level optimizations oriented to DFS and memory optimizations in others architectures. A case of study is management, and their interaction in the reduction of presented with three types of optimization oriented to power consumption of RTOS running on small and dynamic frequency scaling (DFS) and two oriented to medium scale embedded systems. Statistically sound memory management applied to the FreeRTOS real conclusions were obtained with design of experiments time operating system. Two types of workloads were techniques (DOE) [11]. The results show that the used for the work: the MiBench benchmarks and an operating system optimizations impact the reduction of uGC Code, with three different platforms: power consumption depending of the platform and the ATMEGA323, MSP430F149, and LM3S811 and a full workload. We found that there is significant interaction factorial experiment design. between all factors in the power consumption of the system. This document is organized as follows: Section 1. Introduction 2 discusses related work in low power techniques for embedded software, Section 3 illustrates the methodology implemented, and Section 4 shows the Power consumption is an important constraint in results obtained and the analysis. Finally, conclusions embedded systems running real time operating systems are presented. RTOS, due to limited battery life time, heat dissipation of electronic components, size constraints, and costs [13]. RTOS are widely used on the small-scale and 2. Related Work medium-scale embedded systems applications [1]. Power consumption can be reduced via various Different approaches have been used for power hardware optimization techniques such as transistor reductions. At the Instruction Level, Tiwari et al. [12] resizing, the design of dynamically variable voltage describe an approach based on measures of current and hardware [8], and VLSI techniques for low power and power consumed by each instruction. They showed that frequency control methods. Another way to reduce the instructions involving memory access are much power consumption in embedded systems is through more power expensive than instructions that involve optimizations at the software level [5]. Instruction level register accesses. The authors developed others [12], compiler level [14], system level [13], and high optimizations such as instruction reordering and energy level optimizations [5] have been proposed in literature. cost driven code generation. In terms of compiler The related work shows a brief description of the low based approaches, in [14] the authors analyzed

38 optimizations oriented to increase performance in terms dominate performance, and dijkstra algorithm belong to of the effect on power consumption in memory I/O intensive applications. Our uGC Code is firmware systems. Their experiments with the MIPSR10000 that controls a micro gas chromatograph designed by processor demonstrated that performance optimizations the Engineering Research Center for the Wireless techniques are not necessarily good in terms of power Integrated Microsystems [15]. It performs real-time savings. At high level, Dalal et al. in [5] used loop data acquisition, data analysis, and wireless unrolling, function inlining and branching communication. The previous algorithms stressed the transformations in an ARM 32 bits processor. In their overall system at computation, memory and I/O level. results, the branching transformation gave maximum improvements in power. At the real time operating 3.4 The Operating System Level Optimizations system level, Krishna et al. [8] built a low power voltage-scheduling algorithm based in the worst case The optimizations oriented to DFS chosen were: DFS execution times (WCETs) of the tasks, and results based on CPU usage, DFS predictive strategy based on showed an energy saving of 56%. Dick et al. in [6] WCET, and DFS based on tasks priority [9] [10]. In developed an embedded system RTOS/application terms of memory management, static and dynamic Energy profile to use the RTOS in a more energy memory allocation was considered [2]. Below is a brief efficient way. In [1] the authors proposed an energy description of the optimizations. In [10] the authors characterization at different processor speeds to analyze propose a predictive dynamic voltage scaling (DVS) the energy overhead of and embedded operating system scheduling strategy, with two approaches: the first one independently of the application. is based on assigning to each the task a processor clock In contrast to the previous work found in the frequency according to the task priority, this is at the literature, our research is focused on evaluating the task with the lowest priority is assigned a low clock significance of changing all possible factors of interest frequency completing all task deadlines. The second (platforms, benchmarks, operating system level approach is based on reducing the clock frequency optimizations oriented to DFS and memory considering process usage of the CPU which is a management) and their interaction in reduction of measure of the workload of the instances of the active power consumption of RTOS running on small and tasks in a determined time. In [9], a low power medium scale embedded systems. predictive scheduling strategy was proposed. The mechanism predicts the execution time of the instances 3. Methodology of the tasks doing an average of the previous instances, here the WCET (worst case execution time) is used to 3.1 Target Platforms compare if the task will finish its execution before the WCET, then slow down the processor clock frequency Some of the features of the target architectures are: (DFS). In terms of memory management the LMS311 is a 32-bit RISC architecture with deep- optimizations, we chose static and dynamic memory sleep modes, 64 KB single-cycle flash, and an 8 KB allocation [2]. Dynamic at difference of static memory single-cycle SRAM. The MSP430F149 is a 16-bit allocation is the mechanism that runtime searches for a RISC architecture with five power-saving modes, 60KB block big enough in the heap to satisfy the request of flash memory, and 2KB RAM. Typical applications store data for a given task. include industrial control. Finally, the ATMEGA323 which has 8-bit RISC architecture, six sleep modes, 3.5 Instrumentation 32K bytes of flash, and 1K byte of EEPROM. The power dissipation in embedded system based on 3.2 The Real Time Operating System CMOS circuits satisfies approximately the equation 1 [4]: The FreeRTOS is a RTOS designed for embedded (1) microcontrollers. Some features include preemptive, cooperative and hybrid configuration options, and code Where is the probability of switching in power structure written in C [3]. transition (the activity factor), is the loading

capacitance, the voltage supply and the 3.3 Benchmarks clock frequency. The equation 1 shows that the power

consumption is proportional to clock frequency. The The benchmarks selected for our study were: bitcount, basicmath, and dijkstra algorithm [7]. purpose of frequency scaling strategies is to reduce Bitcount and basicmath are categorized as computation the clock frequency during runtime in order to intensive applications therefore ALU operations save power and meet the deadlines of the tasks.

39 Given that we physically measured the current Table 1: Full Factorial Design for Operating System variations (the voltage is constant), our instrumentation Level Optimizations on Each Platform: Power circuit uses a resistor of 1 ohm in series with the power Consumption Averages (mW) supply of microcontrollers. The signal was properly Optimizations MSP430F149 ATMEGA323 filtered using an instrumentation amplifier (AD620). MiBench – MiBench – Data acquisition was done using an oscilloscope uGC Code uGC Code connected to a labview virtual interface (VI) in the host Non– Optimized (1) 14.437 19.045 via the GPIB protocol, then the signal was discretized at 19.654 23.257 100 MHz according to the Nyquist law. Finally, at the Dynamic Memory 13.310 11.481 end of the execution time of the benchmarks, the Allocation - DFS based on CPU average power consumed in that interval was calculated Usage Strategy (2) 15.035 19.670 as the equation 2: Dynamic Memory 13.443 10.867 Allocation - DFS Predictive strategy (2) based on WCET (3) 15.432 16.907 Dynamic Memory 12.601 15.241 3.6 Design of the Experiment Allocation – Constant Frequency (4) 23.776 17.197 Design of experiments techniques (DOE) [11] refers Dynamic Memory 15.276 12.086 to the process of planning the experiment, resulting in Allocation - DFS With Tasks Priority valid and objective conclusions. In this case, the four (5) 16.679 15.166 factors chosen were: the platforms, the workloads, Static Memory 15.550 21.159 optimizations oriented to DFS, and memory Allocation - DFS based on CPU management. The response variable selected was power 20.747 23.086 consumption of the target platforms and an Usage Strategy (6) Static Memory 14.989 23.108 experimental design a full factorial design with four Allocation – DFS replicates was selected. One of the advantages of this with Tasks Priority 16.662 22.989 experiment is that it shows the impact of factors and the (7) interaction between them. We performed the Static Memory 16.138 16.610 Allocation - DFS experiment and statistical analysis of the data using Predictive Strategy ANOVA. Finally the conclusions are presented. Table 1 based on WCET (8) 15.740 22.407 and figure 1 shows the full-factor factorial design for each platform.

4. Results and Analysis

The analysis of variance (ANOVA) is a statistical test used to analyze the significance of the factors over the response variable of a system. The significance level or p-value is selected according to the type of problem. In our study a p-value of 0.05 was chosen, hence there is a 5% of probability to reject the null hypothesis. The null hypothesis that we had is that the selected factors do not have a significant impact in power consumption of medium and small scale embedded system running RTOS. The analysis of variance will demonstrate to us whether this assumption is statistically correct. Figure 2 summarizes the results of ANOVA for the experiment, it show that statistically there is a significant effect (p < 0.05) of the platforms, benchmarks, operating system level optimizations oriented to DFS, and memory management and their interaction in reduction of power consumption. Figure 2: Significance of the factors in the full factorial design: the p-value.

40 Brodersen. Low power cmos digital design. IEEE Journal of Solid State Circuits, 27:473-484, 1995. 5. V. Dalal and C. Ravikumar. Software power optimizations in an embedded system. Proceedings of the 14th International Conference on VLSI Design (VLSID '01), pages 254-259, Washington, DC, USA, 2001. 6. R. P. Dick, G. Lakshminarayana, A. Raghunathan, and N. K. Jha. Analysis of power dissipation in Figure 1: Power Consumption Averages on embedded systems using real-time operating LM3S811 Microcontroller systems. IEEE transactions on computer-aided design of integrated circuits and systems, 5. Conclusions 22(5):615-627, May 2003. 7. M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, Based on the results obtained, dynamic memory T. Mudge, and R. Brown. MiBench: A free, allocation impacted significantly the reduction of power commercially representative embedded benchmark IEEE International Workshop on Workload on the ATMEGA323. The combination of DFS suite. Characterization optimizations and memory management optimizations . WWC-4. 2001, pages 3 – in some cases of the MSP430F149, LM3S811 and 14, Dec. 2001. ATMEGA323 were not significant in power reduction. 8. C. M. Krishna and Y. H. Lee. Voltage-clock- The operating system optimizations impacted the scaling adaptive scheduling techniques for low IEEE Trans. reduction of power consumption depending of the power in hard real-time systems. Comput. platform. We have found that there is significant , 52(12):1586-1593, 2003. interaction between all factors in the power 9. P. Kumar and M. Srivastava. Predictive strategies consumption of the system. for low-power rtos scheduling. In ICCD '00: proceedings of the 2000 IEEE International

Conference on Computer Design, pages 343-348, 6. Future Work Los Alamitos, CA, USA, 2000. 10. C. Lim, H. T. Ahn, and J. T. Kim. Predictive dvs Our future work includes testing different RTOS in scheduling for low-power real-time operating order to find whether the OS affects power system. In ICCIT '07: Proceedings of the 2007 consumption. International Conference on Convergence Information Technology, pages 1918-1921, 7. Acknowledgments Washington, DC, USA, 2007. 11. D. Montgomery. Design and analysis of This work has been supported by NSF Grant No. Experiments. Wiley, New York, 6th edition, 2004. CNS-0837556 and the Engineering Research Centers 12. V. Tiwari, S. Malik, A. Wolfe, and M. T. C. Lee. Program of the National Science Foundation under the Instruction level power analysis and optimization Award Number ERC-9986866. of software. In Proceedings of the Ninth International Conference on VLSI Design, pages References 326-328, Jan 1996. 13. O. S. Unsal and I. Koren. System-level power- 1. A. Acquaviva, L. Benini, and B. Ricco. Energy aware design techniques in real-time systems. characterization of embedded real-time operating Proceedings of the IEEE, 91(7):1055-1069, July systems, chapter 4, pages 53-73. Kluwer Academic 2003. Publishers, Norwell, MA, USA, 2003. 14. J. Zambreno, M. T. Kandemir, and A. N. 2. D. Atienza, S. Mamagkakis, M. Peon, F. Catthoor, Choudhary. Enhancing compiler techniques for Proceedings of the J. Mendias, and D. Soudris. Power aware tuning memory energy optimizations. Second International Conference on Embedded of dynamic memory management for embedded Software real-time multimedia applications. In Proceedings , pages 364-381, London, UK, 2002. of the XIX Conference on Design of Circuits and 15. E. Zellers, S. Reidy, et al. An integrated Integrated Systems, DCIS'04, volume 2, pages micro-analytical system for complex vapor International Conference on Solid- 375-380, Bourdeaux, France, Nov. 2004. mixtures. State Sensors, Actuators and Microsystems, 3. R. Barry. The FreeRTOS Reference Manual, 2009. 4. A. P. Chandrakasan, S. Sheng, and R. W. pages 1491-1496, June 2007.

41 Object Segmentation in Hyperspectral Images

Susi Huamán De la Vega, MS Graduate Student UPRM, [email protected] Dr. Vidya Manian, Assistant Professor UPRM, [email protected] Laboratory for Applied Remote Sensing and Image Processing University of Puerto Rico at Mayagüez, P. O. Box 9048, Mayagüez, Puerto Rico 00681-9048

Abstract image into multiple sub-regions or parts according to The interest for object segmentation in hyperspectral their properties, e.g. intensity, color, and texture. Image images is increasing and many approaches have been segmentation consists of dividing an image into regions proposed to solve this problem. In this research we that represents objects that either have some measure of developed an application that uses both active contours homogeneity within them, or have some measure of and graph cut approaches for objects segmentation in contrast with the objects or their edges [4] [5]. hyperspectral images. Active contours have been widely Many approaches have been developed and used as an attractive image segmentation method proposed for the hyperspectral image segmentation. We because they produce sub-regions with continuous developed an application that uses both active contours boundaries. On the other hand, graph cuts have and graph cut approaches for objects segmentation in emerged as a powerful optimization technique for hyperspectral images. This is an extension of the minimizing energy functions and avoiding the problems algorithm presented at [6]. of local minima inherent in other approaches. The The primary aim of this application is to improve combination of those two models can show robust the accuracy of objects segmentation results in objects segmentation capability due to their ability to hyperspectral images, using both spatial and spectral jump over local minima and provide a more global information. Our application shows results with enough result. Graph cuts guarantee continuity and lead to accuracy and versatility, which will can be applied in smooth contours free of self-crossing and uneven many fields and it will represent an important spacing problems. In addition, our application also advantage to the segmentation field on hyperspectral uses both spatial and spectral information from the images. hyperspectral image, and segments more than one 2. METHODS object. Algorithm testing is done with real hyperspectral images. This approach can be applied in 2.1 Proposed Application Description many fields and it will represent an important The proposed application segments all the objects advantage to the object segmentation field. in the image. The application is able to show the objects

contours based on certain image’s characteristics or 1. Introduction properties. The application uses both spectral and The interest in hyperspectral images analysis is spatial information from images. Figure 2.1 shows the increasing and it is playing an important role in many general block diagram for our approach. applications of diverse fields. One of the reasons can be principally because the hyperspectral images are characterized by large amounts of data taken at narrow and contiguous spectral bands [1] providing important information, and helping to discriminate better between different objects or regions in a hyperspectral image. The hyperspectral image data is increasingly available from a variety of sources, including commercial and government satellites; this is accompanied by an increase in spatial resolution and in the number of spectral channels [2]. Hyperspectral Figure 2.1: Block diagram for objects segmentation on Hyperspectral imaging technology has recently found applications in Images. many fields, including agriculture, archeology, biology, defense, forensics, medicine, pharmaceuticals, remote 1. Obtaining the initial contour based on spectral sensing, and target detection [3]. signature. One important task in image analysis is Most of the methods based on active contour segmentation, which is simply to partition or divide an require an initial contour. The basic idea is to start with initial boundary shapes represented in a form of closed

42 curves, i.e. contours, and iteratively modify them by 2. Representing the image as an adjacency graph G. applying shrink/expansion operations according to the In this step the hyperspectral image is transformed constraints of the image [7]. It is important to give a as an edge weighted graph ( , ), where each pixel correct position for the initial contour, because most of within the hyperspectral image is mapped to a vertex the methods are sensitive to initial contour [8], and the , if two pixels are adjacent,𝐺𝐺 𝑉𝑉 𝐸𝐸 there exits an edge segmentation results depend on it. ( , ) that has a nonnegative weight, if ( , ) In this application, to the initial contour is obtained the𝑣𝑣 ∈ weight𝑉𝑉 is zero. The weights used in this approach is based on spectral signature for which, some points of as𝑢𝑢 follow:𝑣𝑣 ∈ 𝐸𝐸 𝑢𝑢 𝑣𝑣 ∄𝐸𝐸 the desired object are selected. ( , ) = ( ( , ) + ( , ))2 Given and vectors that represents the Where: 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑆𝑆𝑆𝑆𝑆𝑆 indexes of the selected points on X and Y direction ( ,𝐸𝐸) 𝑖𝑖 =𝑗𝑗 ( 𝑔𝑔 _ 𝑖𝑖 𝑗𝑗( )/ 𝑔𝑔 ( 𝑗𝑗 𝑖𝑖 _ ( ))) respectively.𝑃𝑃𝑃𝑃 With those𝑃𝑃𝑃𝑃 points, we obtain the real 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑖𝑖𝑖𝑖 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑘𝑘 𝑖𝑖𝑖𝑖 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑔𝑔 (𝑖𝑖 ,𝑗𝑗 ) = 𝑒𝑒𝑒𝑒𝑒𝑒 (−𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 _ ( 𝑖𝑖)/ 𝑚𝑚𝑚𝑚𝑚𝑚 ( 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 _ ( 𝑘𝑘))) pixels from the hyperspectral images that represent the 𝑆𝑆𝑆𝑆𝑆𝑆 𝑖𝑖𝑖𝑖 𝑆𝑆𝑆𝑆𝑆𝑆 𝑘𝑘 𝑖𝑖𝑖𝑖 𝑆𝑆𝑆𝑆𝑆𝑆 desired object. 𝑔𝑔Where𝑖𝑖 𝑗𝑗 , 𝑒𝑒𝑒𝑒𝑒𝑒 −_ 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔( ) is𝑖𝑖 the𝑚𝑚 𝑚𝑚image𝑚𝑚 𝑔𝑔𝑔𝑔𝑔𝑔 pixel𝑔𝑔 intensity𝑘𝑘 = ( , ,:) gradient at location k in the direction of on the 𝑖𝑖𝑖𝑖 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 gray scale of𝑔𝑔𝑔𝑔𝑔𝑔 hyperspectral𝑔𝑔 𝑘𝑘 images, and ( ) where 𝑜𝑜𝑜𝑜𝑜𝑜 is𝑜𝑜𝑜𝑜 𝑜𝑜the 𝐼𝐼𝐼𝐼𝐼𝐼original𝑃𝑃𝑃𝑃 hyperspectral𝑃𝑃𝑃𝑃 image _ is the image pixel intensity gradient at location𝑖𝑖 → 𝑗𝑗k in the representation. is a specific pixel in the image, 𝑖𝑖𝑖𝑖 𝑆𝑆𝑆𝑆𝑆𝑆 𝐼𝐼𝐼𝐼𝐼𝐼 direction of in the SAD images 𝑔𝑔𝑔𝑔𝑔𝑔of the𝑔𝑔 desired𝑘𝑘 where represents𝑖𝑖𝑖𝑖 the position on the , and respectively.𝑝𝑝𝑝𝑝 The spectral angle distance object. 𝑖𝑖 → 𝑗𝑗 between𝑖𝑖𝑖𝑖 the desired object and is given𝑖𝑖 by: −𝑟𝑟 𝑟𝑟 The type of connectivity is an 8-connectivity 𝑗𝑗 − 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 graph, which means that each vertex in the graph is 𝑖𝑖𝑖𝑖 ( ) corresponding to a pixel , has edges connecting it to ( , ) = cos 1( 𝑝𝑝𝑝𝑝 ) ( ( )) 𝑇𝑇 ( ) − 𝑖𝑖𝑖𝑖 its 8 neighboring vertices, which correspond to the 8 𝑖𝑖𝑖𝑖 �𝑝𝑝𝑝𝑝 ∗ 𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 � 𝑆𝑆𝑆𝑆𝑆𝑆 𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑖𝑖 neighboring pixels of . 𝑝𝑝 Then, for all we𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 calculate𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚 𝑜𝑜𝑜𝑜𝑜𝑜 the𝑜𝑜𝑜𝑜𝑜𝑜 angle∗ 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 between𝑝𝑝𝑝𝑝 3. Dilate current boundary.𝑝𝑝 and the mean𝑝𝑝 of𝑝𝑝𝑖𝑖𝑖𝑖 the desired object represented by The next step is to dilate the current boundary into . This step permits to obtain a new image in two 𝑖𝑖𝑖𝑖 its contour neighborhood (CN) with an inner boundary 𝑝𝑝dimensions,𝑝𝑝 in which the desired object is distinguished and an outer boundary. The size of the CN can be 𝑜𝑜𝑜𝑜𝑜𝑜better𝑜𝑜𝑜𝑜 𝑜𝑜in gray scale representation as we can see in specified by the user or we can put it as a fixed Figure 2.2. variable. The dilation size is an important parameter, because in some cases the accuracy of segmentation result depends on it. Note that for the first iteration the current boundary is the initial contour obtained in the first step. According to [6], dilation process has several a) b) c) Figure 2.2: Regions selected from “PR Hyperspectral Science Areas” objectives. As dilation process generates a CN of the captured by AISA Eagle Sensor with 128 bands. a) True color image current contour, it makes the algorithm capable of from ENVI, b) Gray scale of the hyperspectral image, and c) Image jumping over local minima within this CN. The authors formed by SAD. also explained that the size of the CN can be selected

Initial boundary is obtained applying angle metric, based on the size of the object to be segmented and the A threshold value to compare all obtained results for amount of noise in the data. Later we will show and SAD is specified, all the angles between minimum discuss the results using different sizes (Figure 2.4) for dilation process. angle and maximum angle plus threshold are chosen as a desired object, and we give them the value of one, and the value for the others is zero as we can see in Figure 2.3. Finally, we extract the contour of these binary images. a) b) c) Figure 2.4: Dilation process. a) Initial contour. b) Size of CN=3. c) Size of CN=10.

4. Identify the vertices in inner and outer boundary. Another objective of the dilation process is that it a) b) Figure 2.3: a) Binary image obtained based on SAD with 15 generates an inner boundary that corresponds to the thresholds value, and b) Initial boundary for images a). multiple sources, and outer boundary corresponding to

43 multiple sinks in the corresponding graph as show in needs to prove the same algorithm with other images Figure 2.5. we need to do some changes that in certain cases is not feasible or easy to do. Our application works with many hyperspectral images. It depends on the initial parameters that user enters. Moreover, algorithms for object segmentation often segment only a specific object from its background; our application can segment more than one object depending of the input Figure 2.5: Vertices in inner, current, and outer boundaries. parameters. 5. Compute the s − t minimum cut. Treating the pixels on the inner boundary as 3. RESULTS multiple sources and the pixels on the outer boundary 3.1 Data Set Description as multiple sinks as show in Figure 2.5, the goal of this Algorithm testing is done with remote sensing step is to compute the best cut that would give an hyperspectral images such as: Hyperspectral Data optimal segmentation (Figure 2.6). This is the problem Imagery Collection Experiment (HYDICE), which of finding the global minimum contour within CN; captures the information in 210 contiguous bandwidths formulated as a multi-source multi-sink s−t minimum from the visible to shortwave infrared (400-2500 nm). cut problem on the graph. The cost of a cut is defined as Airborne Visible/Infrared Imaging Spectrometer the sum of the costs of edges that it severs. (AVIRIS), which has 224 contiguous spectral channels with wavelengths from 400 to 2500 nanometers. ( , ) = ( , ) , The SOC-700 hyperspectral camera has a spectral 𝑆𝑆 𝑇𝑇 � 𝑐𝑐 𝑢𝑢 𝑣𝑣 resolution of 4 nm with 120 bands and a spectral range 𝑢𝑢∈𝑆𝑆 𝑣𝑣∈𝑇𝑇 The s − t minimum cut problem is to find a cut in from 400 to 900 nm. It is available at the Laboratory for G that separates s and t as show in figure 2.6, with the Applied Remote Sensing and Image Processing smallest weight [4] [5]. (LARSIP). Finally we used hyperspectral images taken

by AISA Eagle Sensor. Those images were taken from different spatial resolution, and they have 128 spectral bands.

3.2 Experimental Results

Figure 2.6: Computed new boundary. In this section, we present our experimental results. We have used different input parameters to compare the There is an important correspondence between segmentation results. We chose points that represent the flows and cuts in networks, the max-flow min-cut pixels of the desired object, then we set values for the theorem states that: “The maximum flow from a vertex s threshold to obtain the initial contour, and we also set to vertex t, |f|, is equal to the value of the weight c(s, t) the size of dilation process. Our application is of the minimum cut separating s and t”; [9] formulated implemented in Matlab R2009b, and we also call some it. It makes it possible to solve s-t minimum cut libraries implemented in Visual Studio 2008 C++. problem by using existing max-flow algorithms.

6. Return to step 2 until the algorithm converges. This algorithm iteratively replaces a contour with a global minimum within the CN of the contour until the objective is achieved. This approach is guaranteed to Figure 3.1: Regions selected from “PR Hyperspectral Science Areas” converge by the theorem that was demonstrated in [6]: captured by AISA Eagle Sensor with 128 bands. Segmentation “Within a finite data set, the graph cuts based active generated (bands 54 35 15 RGB) using 5 points of the object of interest, threshold=10 and CN=5. contour will either converge or oscillate between several results with the same weight after finite iterations”, this theorem is known as the convergence theorem.

2.2 Contribution of this project

In this application, the object segmentation a) b) algorithm for 2-d and 3-d images proposed by [6] is Figure 3.2: Region selected from Fake Leaves hyperspectral image improved and extended to hyperspectral images. Object captured by SOC-700. a) True color image from ENVI, b)Segmentation generated (bands 40 35 15 RGB)using 10 points for segmentation algorithm usually is built to work only the first object(blue) and 5 point for the second object(red), for both with a specific hyperspectral image and when someone threshold=5 and CN=5.

44 4. CONCLUSIONS Despite recent advances in hyperspectral image processing, automated object segmentation from hyperspectral image data is still an unsolved problem. We developed an application that uses an algorithm that combines graph cut and active contour methods to segment objects in hyperspectral images. We proved that when using appropriate parameters we can obtain more accurate results. Our results with different sets of Figure 3.3: Regions selected from Washington D.C. Mall with hyperspectral images confirmed that our application 191bands. Segmentation generated (bands 177 171 136 RGB) using 15 points of the object of interest, threshold=15 and CN=5. work with many different set of hyperspectral images.

Acknowledgements This work was supported by the Department of Defense under grant HM1582-08-1-0047 and by the Center for Subsurface Sensing and Imaging Systems, University of Puerto Rico at Mayaguez. Presentation of this paper was supported in part by NSF Grant CNS- 0837556. We would like to thank Ning Xu who shared his code a) b) used in [6]. Figure 3.4: Region selected from HYDICE Forest with 210 bands. Segmentation generated (bands 49 37 18 RGB) using 6 points of the References object of interest, threshold=10 and a) CN=3.b) CN=5. [1] S.M. Cruz, “Hyperspectral Image Classification Using Spectral Histograms and Semi-Supervised Learning,” University of Puerto Rico - Mayaguez Campus, 2008. [2] A.G.D.S. Filho, A.C. Frery, C.C.D. Araújo, H. Alice, J. Cerqueira, J.A. Loureiro, M.E.D. Lima, M.D.G.S. Oliveira, and M.M. Horta, “Hyperspectral Images a) b) c) Clustering on Reconfigurable Hardware Using the K- Figure 3.5: Regions selected from “PR Hyperspectral Science Areas” Means Algorithm,” Proceedings of the 16th symposium captured by AISA Eagle Sensor with 128 bands. a) True color image on Integrated circuits and systems design, IEEE from ENVI, b) and c) Segmentation generated (bands 110 93 80 Computer Society, 2003, p. 99. RGB) using 5 points for the first object(blue) and 4 point for the [3] A. Erturkerturk and S. Erturkerturk, “Unsupervised second object(red), threshold=10, CN for first object=5 all of this parameters for both figures. b) CN for second object (red) =3. c) CN Segmentation of Hyperspectral Images Using Modified for second object (red) =5. Phase Correlation,” IEEE Geoscience and Remote Sensing Letters, vol. 3, 2006, pp. 527-531. [4] S. Vicente, V. Kolmogorov, and C. Rother, “Graph cut based image segmentation with connectivity priors,” 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA: 2008, pp. 1-8. a) b) c) [5] Y. Boykov and G. Funka-Lea, “Graph Cuts and Efficient Figure 3.6: Region selected from HYDICE Terrain sample scene (bands 168 93 85RGB) with 210 bands. Segmentation generated N-D Image Segmentation,” Int. J. Comput. Vision, vol. using 5 points, threshold=9 all of this parameters for three images, 70, 2006, pp. 109-131. and the difference is a) CN =5. b) CN =7. c) CN =9. [6] Ning Xu, R. Bansal, and N. Ahuja, “Object segmentation using graph cuts based active contours,” 2003 IEEE We tested our application using different data set Computer Society Conference on Computer Vision and obtained by various sensors. Figure 3.1 shows that our Pattern Recognition, 2003. Proceedings., Madison, WI, application works with images that have holes without USA: , pp. II-46-53. any problems. Figures 3.2 and 3.5 show the [7] P.L. Cheolha, “Robust Image Segmentation using Active segmentation results for two different objects. Figures Contours: Level Set Approaches,” North Carolina State 3.4 (b) and 3.5 (b) show that if the dilation size is large University, 2005. [8] Z. Ying, L. Guangyao, S. Xiehua, and Z. Xinmin, in comparison with the desired object the segmentation “Geometric active contours without re-initialization for result is not correct; for this reason, to work with small image segmentation,” Pattern Recogn., vol. 42, 2009, object it is necessary to give a small size of dilation. pp. 1970-1976. Finally, the running time of the segmentation algorithm [9] L. Ford and D. Fulkerson, Flows in Networks, Princeton for Figure 3.3 that is the biggest images tested here is University Press, 1962. 0.4562 seconds approximately showing that the segmentation algorithm is fast.

45 Hyperspectral Texture Synthesis by 3D Wavelet Transform

Nestor Diaz-Gonzalez and Vidya Manian LARSIP Department of Electrical & Computer Engineering University of Puerto Rico, Mayaguez PR 00680, USA Emails: [email protected], [email protected]

ABSTRACT Texture synthesis techniques that generate an output texture from an example input can be roughly Hyperspectral images have textured regions and in categorized into three classes. The first class uses a many cases there are not sufficient samples to train fixed number of parameters within a compact classifiers. By simulating more samples that are self- parametric model to describe a variety of textures. The similar to the original texture efficient classifiers can be second class of texture synthesis methods is non- trained and used for classification. These tools also parametric, which means that rather than having a fixed provide means for generating synthetic hyperspectral number of parameters, they use a collection of images, without the expense of data collection. The first exemplars to model the texture. The third, most recent step toward this goal is to develop a hyperspectral class of techniques generates textures by copying whole texture synthesis algorithm that efficiently combines patches from the input. both the spectral information and the spatial variability in the original image. We propose a method for texture The problem of texture synthesis can be formulated synthesis using 3D wavelet transform (3DWT). The as follows: let us define texture as some visual pattern synthesis is done by taking the 3D wavelet transform of on an infinite 2-D plane which, at some scale, has a the original sample image. The wavelet coefficients are stationary distribution [1]. It could also be defined as a representation of the hyperspectral image as a the structural pattern of surfaces which is homogeneous compact code. These coefficients are then synthesized in spite of fluctuations in brightness and color. In our by an image quilting algorithm. Then the synthesized case we want to synthesize hyperspectral texture coefficients are reassembled and the inverse 3DWT is images, which mean that each of the pixels in the image performed. Results of the synthesis are presented using will be a vector with the spectral information of the remote sensing hyperspectral images. Support Vector material that’s been sensed. Machines (SVM) are used to classify the texture properties of the synthesized textures. Original textures Given a finite sample from some texture, an image, are used as a training set and the synthetic textures as the goal is to synthesize other samples from the same the testing set. texture. The usual assumption is that the sample is large enough that it somehow captures the stationarity of the 1. Introduction texture and that the approximate scale of the texture elements is known. Synthesis approaches assume that It is possible to synthesize certain textures the signal samples s1, s2,…,sk emerge from a by directly simulating their physical generation stochastic source Si. The synthesis process estimates processes. Biological patterns such as fur, scales, and the statistics of a hypothetical source Z = Z(S1,…,Sk). skin can be modeled using reaction diffusion and The source Z is statistically “closest” to the samples Si. cellular texturing. Some weathering and mineral phenomena can be faithfully reproduced by detailed In this paper, hyperspectral textures are synthesized simulations. These techniques can produce textures using Complex 3-D Dual-tree Wavelet Transform (3D- directly on 3D meshes so the texture mapping distortion DT-WT). The 3D-DT-CWT uses a dual tree of wavelet problem is avoided. However, different textures are features that are assigned as real and imaginary usually generated by very different physical processes components of complex wavelet coefficients. A full so these approaches are applicable to only limited explanation of how the wavelet transform operates can classes of textures. be found in [2, 3].

46 2. Methods 1. Pick size of block and size of overlap. 2.1 Hyperspectral Images 2. Square blocks from the input texture are patched together to create a new texture sample. This is The hyperspectral texture images are patches taken done in raster order. To start pick a block from an image collected over Enrique Reef in La randomly. Parguera, Puerto Rico using the AISA sensor (Figure 1). Each image has 128 bands.

2.2 3D Wavelet Transform

The complex wavelet transform is used as a basis

for nonparametric texture synthesis. The introduction of Figure 2: Overlap for the dotted block. the wavelet decomposition into the synthesis procedure has two advantages. First, it facilitates the measurement of texture statistics at particular scales. The second 3. Search input texture for a block that satisfies advantage is the reduction in computational load, since overlap constraints (above and left), within some the synthesis is done to the coarser scales, the original error tolerance, as it can be appreciate in figure 3. information is represented by fewer pixels, and this The blocks overlap and each new block is chosen allows larger features to be represented by smaller so that it “agrees” with its neighbors in the region neighborhoods. of overlap. 4. Paste new block into resulting texture. 2.3 Image Quilting 5. Use dynamic programming to compute the error surface between the newly chosen block and the This texture synthesis method is presented in [6], old blocks at the overlap region, and then find the but instead of taking just the 3 RGB bands, it’s minimum cost path through the error surface at the modified to take the whole pixel as a vector. A overlap and make that the boundary of the new MATLAB implementation of this algorithm can be block. This is done to reduce blockiness the found in [7]. boundary between blocks. Repeat.

From a previous work presented in [8] we can 2.4 Texture Synthesis Algorithm appreciate that neighbors are highly correlated. So in A general idea of the work presented in [4] was this case the idea is very similar, instead of obtaining implemented. Their concept was extended from color the probability of a pixel given its neighborhood, we images to hyperspectral images by using a 3D wavelet want the probability of a block given its neighborhood. transform and a different texture synthesis algorithm for This makes the algorithm much faster because we’re the unknown wavelet coefficients. For the texture synthesizing a whole block at once. This technique will model, Markov Random Fields (MRF) is used since fail for highly structured patterns due to boundary they have been proven to cover the widest variety of inconsistencies, but for many stochastic textures it useful texture types. MRF methods, model a texture as works pretty well. For this algorithm we have two a realization of a local and stationary random process. variables: block size and overlap size. The block must Each pixel of a texture image is characterized by a be big enough to capture the relevant structures in the small set of spatially neighboring pixels, and this texture. In our case the width of the overlap edge (on characterization is the same for all pixels. one side) was 1/6 of the size of the block. The error was

computed using the L2 norm on pixel values. The error Given the initial sample image Ie of size n x m x l tolerance was set to be within 0.1 times the error of the and the required output size N x M of the image to be best matching block. The algorithm is explained next: synthesized Is, the algorithm proceeds as follows:

Input Texture Neighboring blocks Minimal Error with marked blocks constrained by overlap Boundary Cut

Figure 1: Quilting Texture.

47 × ((l∗k)/2) where k is the number of wavelet Hyperspectral 0 coefficient matrixes (figure 3, steps 1 and 2). In our Texture Image case we had three band pass sub images in each of the spectral quadrants 1 and 2. This gives a total of six band pass images with complex coefficients at each level and a lowpass image. These images are strongly oriented at angles of ±15º, ±45º, and ±75 º. 3D Wavelet 1 3. Synthesize the new matrix using the image quilting Transform algorithm explained in section 2.3. This will result in a matrix of size N/2 × M/2 × l/2. 4. Reassemble the wavelet coefficients. 5. Do the wavelet reconstruction. This will result in N × M × l matrix, which was our goal. Concatenate 2 Coefficients 2.5 Feature Extraction

Edge-based texture descriptors and regional texture descriptors were implemented. These 6 edge based metrics are: the average (f1), the standard deviation Image (f2), the average deviation of gradient magnitude (f3), 3 Quilting the average residual energy (f4), the average deviation of the horizontal directional residual (f5) and, the average deviation of the vertical directional residual (f6). The Spectral Angle Distance (SAD) is used in the feature computation. After the synthetic textures were obtained, each of these features was calculated for every pixel neighborhood. A neighbor size of 7×7 pixels was used. Reassemble 4 Wavelet 2.6 Classification Coefficients After obtaining the features from both, the original and the synthesized hyperspectral images, we proceeded to train a SVM classifier using the original images as the training set, and the synthetic ones as the testing set. The resulting classification results are shown in table 1.

The algorithms presented in Section 2 were

Inverse 3D implemented in Matlab R208b. 5 Wavelet Transform 3. Results: Enrique Reef

Figure 3: Block Diagram of the algorithm

1. Apply the 3D wavelet transform to the input texture Ie. MATLAB Wavelet Toolbox was used in the implementation of some preliminary test for RGB images, but for the hyperspectral textures the MATLAB code found in [5] was used, because it was already implemented for 3 dimensions. 2. Concatenate the sub images obtained (coefficients) Figure 4: Selected Texture primitives from Enrique Reef from the wavelet transform in the 3rd dimension, Image. this will result in a 3D matrix, with size n/2 × m/2

48 Acknowledgment

This work was supported by the Department of Defense under grant HM1582-08-1-0047 and by the Center for Subsurface Sensing and Imaging Systems, (a) (b) University of Puerto Rico at Mayaguez.

Presentation of this poster was supported in part by NSF Grant CNS- 0837556.

References (c) (d) 1. A. Efros and T. Leung. Texture synthesis by non- parametric sampling. In International Conference on Computer Vision, volume 2, pages 1033–8, Sep 1999. 2. Nick Kingsbury. Image processing with complex wavelets. In Phil. Trans. Royal Society London A, (e) September 1999, on a Discussion Meeting on “Wavelets: The Key to Intermittent Information?”, Figure 5: Texture synthesis results. The smaller patches are February 1999. the input textures and to their right are synthesized results. 3. Olivier Rioul and Martin Vetterli. Wavelets and The shown images are the RGB color-composite from the signal processing. IEE Signal Processing Magazine original hyperspectral images, using bands 60, 38 and 19. vol 8, no 4, October 1991. Each texture is generated using single level wavelet decomposition, with block sizes of 7x7. Enrique Reef textures: 4. C. Gallaguer and A. Kokaram, Wavelet Based (a) Mangrove (b) Water (c) Sea grass (d) Sand (e) Coral Reef. Texture Synthesis, In Proceedings of IEEE International Conference on Image Processing, Table 1: Confusion matrix using SVM classifier volume II, 2004, 462-465. 5. Wavelet Software at Brooklyn Poly. C1 C2 C3 C4 C5 Total http://taco.poly.edu/WaveletSoftware/dt3D.html. C1 3013 26 1542 22 773 5376 (Accessed: 2010, March 9th). C2 0 7711 1025 0 0 8736 6. A. A. Efros and W. T. Freeman, "Image Quilting C3 0 830 5386 0 0 6216 for Texture Synthesis and Transfer", SIGGRAPH C4 44 0 30 4542 120 4736 01 C5 680 115 583 0 4838 6216 7. http://mesh.brown.edu/dlanman/courses/en256/Tex th Total 3737 8682 8566 4564 5731 31280 tureSynthesis.zip. (Accessed: 2010, March 9 ). 8. N. Diaz and V. Manian, "Hyperspectral texture 4. Conclusions synthesis by multiresolution pyramid decomposition." In Proceedings of SPIE:

Algorithms and Technologies for Multispectral, A texture synthesis algorithm for hyperspectral Hyperspectral, and Ultraspectral Imagery XV, Vol. images was successfully implemented. For most of the 7334, April 2009. hyperspectral textures used, the algorithm showed an overall classification accuracy of 81%. This metric tells us that most of the pixels agreed with their generating texture, which was what we were trying to prove. Each of texture took around 6 seconds to synthesize. The main limitation of this method is the size of the original hyperspectral texture patch. Since some remote sensing images do not have good spatial resolution, a big enough patch cannot be selected as an input image. It is desired to run more tests in other scenarios, like medical tissue textures.

49 Routing Performance in an Underground Mine Environment

Joseph Gonzalez

Department of Computer Science University of Puerto Rico in Arecibo, Arecibo, PR 00612, USA Email: [email protected] Abstract underground mine that consists of narrow tunnels This paper will describe the methods used to test and turns, the movement of data is constrained. As and analyze the performance of routing schemes in such, wireless sensor networks may behave in an an underground mine environment. It will unexpected or irregular manner. Knowing how a investigate the effects of distance within the mine as network will react is crucial to setting up a reliable well as obstructions to the line of sight. The results route. of the experimentation will be used to discuss In a mine environment, reliability is crucial methods of optimizing routing performance in the because even a small error can endanger lives. The mine environment as well as suggest ways to modify Mine Safety and Health Administration keeps and some current routing algorithms to take the results distributes data about fatalities in the mine into account. environment. Information about the number of fatalities in non‐metal and metal mines between 1. Introduction 2004 and 2008 as well as the causes can be found at [5], and [6] shows this same information for coal

mines. While some of these deaths could have been Wireless sensor networks (WSN) are used in caused by human error, the ability to monitor miner many different common applications, from location, pressure, temperature or air content might monitoring the structural health of a bridge to have allowed some of these 305 lives to be saved. monitoring environmental conditions such as Improved response time to humidity or temperature to military applications accidents or prevention of a dangerous situation can [1][2][3]. In a typical WSN application, a multi‐hop go a long way in decreasing the number of fatalities communication scheme is required since direct in such a dangerous environment. (single‐hop) communication cannot be guaranteed [4]. Hence, a routing scheme is required. A routing scheme is a method by which a node determines the 1.1 Mine Environment multi‐hop path data should take to reach the destination. There are many different routing The environment in question is much different from protocols, each with a unique way of determining the usual settings WSNs are deployed in, and is in the path that a packet should take to get to a specific fact unique to almost any other environment. A destination. Environmental factors such as humidity, mining environment varies greatly from an indoor obstacles and RF interference have environment such as a lab or office building, or a significant impact on network performance, other outdoor environments such as forests or fields especially due to constraints in terms of available that WSNs are frequently used to monitor. Indoor energy, transmission range or data rate. Nodes in a environments are usually well controlled: the walls WSN are limited in the amount of power they have are at form well defined corners, ceiling shape is available [4], and as such need to conserve energy well defined, and space is divided in a well while still reliably transmitting data to a destination. measured manner. Large obstructions rarely exist or Line of sight and distance are two very important are easy to work around. On the other hand, the variables that can affect the reliability of a link mine is full of narrow corridors, sudden turns, between two nodes. In many environments, such as uneven walls of varying composition and a ceiling a lab or an open field, packets are free to travel that changes height frequently. Line of sight can be unhindered within certain bounds, such as the walls difficult to obtain and walls are often too thick to of the lab. However, in an environment such as an send a strong signal through.

50 Even compared to other natural environments, the on a derivative of the 8051 microcontroller, the mine is different. Fields and forests are considerably C8051F120, which operates at 50MHz. more open than a mine: mote placement is much It communicates with the XBee module through a less constrained. Also, the types of obstructions are UART serial interface. The XBee communication different: a tree will deflect a signal differently than module is developed by MaxStream and complies a rock wall or a mine front. Even if the forest does with the IEEE 802.15.4 standard [11]. For in depth have some complex line of sight issues, large trucks technical specifications for the XBee module, see with equipment are rarely driven down the only [11]. For detailed information on the IEE 802.15.4 available path for data and people to travel. These standard, see [12]. mobile obstructions offer yet another complexity to the mine environment. Also, a mine will constantly 2.2 Testing Design grow, and new sensors will have to be added. A Five topologies were designed to test the forest or a field will not expand with the same line of sight and distance factors in the mine. These rapidity as a mine. two scenarios were selected because they are two challenges that every WSN design faces. Optimizing 2. Methodology and Design network performance for one or both of these cases can result in a longer lasting, more reliable network. The long distance tests (LD1, LD2, and LD3) were 2.1 Performance Parameters performed by taking three motes, then moving the Two quantities were taken into account when two end motes further from the middle node. This is determining the performance of a routing topology: in effect the same as using more motes to cover a the reliability of the link and the RSSI, or Received given distance. The number of motes was limited to Signal Strength Indicator. RSSI is a measurement of reduce inconsistencies caused by hardware power in a received signal [10] and is measured in differences and malfunctions. Testing the reception dBm, and is a power ratio as referenced to the of two motes allowed for easy testing while still milliwatt. For some time, it was thought that RSSI allowing multi‐hop routing. Line of sight (LOS1 and was a poor factor to judge performance on. LOS2) was tested by taking the last mote in the However, experiments done by Srinivasan and Levis routing path and moving it such that the line of sight show that for a measured RSSI greater than ‐87dBm, was barely blocked. It was then moved so that the the packet reception ratio was above 85% [10]. amount of rock wall between it and the middle node Experiments done in the mine environment here was increased. Only the reception of the last node in verify this finding: Figure 2.1 shows the the routing chain was tested. experimental findings. The reliability of a link was The initial transmitter remained stationary to determined as the ratio of the number of packets ensure only one degree of freedom. Table 2.1 shows received to the known number of packets sent. the distance between the two nodes for each topology. For all topologies, node 41 transmitted 50 packets to node 75, which routed the packets to node 76. After the packets had been routed, the base station collected RSSI and reliability data from both node 75 and 76. This procedure was performed 30 times for each topology to ensure a normal distribution of data and allow the use of the t‐ and F‐tests. The distance between node 75 and 76 did not vary between the second and third long distance test because of extension cord restrictions. Node 76's position in the corridor changed slightly for the third test because it was being held by a human, not sitting on a solid immobile surface. Figure 2.1 Measured RSSI vs Reliability Table 2.1: Distance Between Motes for Each Topolgy (ft)

2.1.2 Hardware Test ID| Node 40 to Node 75|Node 75 to Node 76 The hardware used in this research was the LD1 | 25.62 31.75 Missouri S&T Mote with an XBee communication LD2 | 46.7 60.55 module used to send data. The MS&T Mote is based LD3 | 58.49 60.55 LOS1 | 25.09 15.09 LOS2 | 25.09 26.61.

51 3.1 Results Modification of the AODV algorithm to include a Data was collected for the nodes with addresses reliability metric is actually quite simple: upon 75 and 76. Table 3.1 shows the Mean, Variance, and receipt of a RREQ, check it's RSSI. If the RSSI falls Average Reliability for node 75 for the distance below a certain threshold, do not forward the tests, and Table 3.2 shows the same data for both packet. Assuming that a link is equally reliable in the distance tests and the line of sight tests. For this both directions, this assures that every link in the data, a higher mean (closer to zero) is better. path determined is reliable. This allows AODV to still 3.2 Analysis of Results take advantage of a path with long hops in order to Analysis of the results was done with standard minimize the number of hops without sacrificing statistical F‐ and T‐tests. The F‐test was used to reliability. determine whether equal variances could be assumed when performing the T‐test. For all 4. Conclusions comparisons the null hypothesis was that no topology performed better than the other. Figures The routing environment clearly offers 3.3 and 3.4 show the results of the statistical tests. some environmental challenges that can affect the The statistical analysis of node 75 shows that for performance of any given routing protocol. each set of comparisons the null hypothesis is However, some of these factors can be exploited to rejected. The results of the T‐test show that the First provide a more reliable network. Taking advantage long distance scenario (LD1) performed better than of the environment to increase the signal strength the second (LD2), which was superior to the third for communication can help offset the interference (LD3).Statistical analysis of node 76's behavior also caused by moving equipment or changes in the shows that the null hypothesis is rejected in all mine's shape, both natural and artificial. These cases. For the line of sight topologies, it was conclusions can hold for any environment: determined that LOS1 (the first line of sight interference could indeed be exploited to provide a topology) performed better than LOS2. In the case of stronger signal in any environment. However, the the long distance tests, LD1 was inferior to LD2, mine is so uniquely designed that it can offer many which in turn had worse performance than LD3. such opportunities to exploit this. Modifying current routing protocols to take advantage of these exploits can save much of the time that it would take to develop a new protocol, time which can be spent solving problems such as the damage the harsh environment can have on the sensors. Undoubtedly, there will be other environmental factors that will affect how routing occurs, but this research can be a good starting point in beginning to solve these problems and begin to integrate sensors into an environment where they could save lives.

5. Acknowledgments

The existing code base for the MS&T motes as well as the testing code were written by Mr. Mohammed Rana Basheer of Missouri University of Science and Technology.

The Statistical tests were performed by Nate Eloe, current student of Missouri University of Science and Technology.

3.3 Discussion Presentation of this poster was supported in part by NSF Grant CNS‐ 0837556. 3.3.1 AODV AODV determines the path data takes by broadcasting a RREQ to its neighbors. If a node receives a RREQ, if it is not the destination node it will rebroadcast the RREQ to its neighbors.

52 6. References IEEE Workshop on Mobile Computing Systems and Applications, vol. 2, pp. 90‐100, 1999. [1] I. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, Wireless sensor networks: a survey," [9] J. Fonda, M. Zawodniok, S. Jagannathan, and S. Computer networks, vol. 38, no. 4, pp. 393‐422, Watkins, Development and implementation of 2002. optimized energy‐delay sub‐network routing pro‐ tocol for wireless sensor networks," in Proc. of the [2] J. Lynch and K. Loh, A summary review of IEEE Int. Symp. On Intelligent Control, pp. 119‐124. wireless sensors and sensor networks for structural health monitoring," Shock and Vibration Digest, vol. [10] K. Srinivasan and P. Levis, RSSI is under 38, no. 2, pp. 91‐130, 2006. appreciated," in Proceedings of the Third Workshop on Embedded Networked Sensors (EmNets [3] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, 2006),2006. and J. Anderson, Wireless sensor networks for habitat monitoring," [11] I. Maxstream, XBee," XBee‐PRO OEM RF Modules Product Manual v1.xAx Available Online: [4] M. Vieira, C. Coelho Jr, D. da Silva Jr, and J. da www. maxstream. net. Mata, Survey on wireless sensor network devices," in IEEE Conference Emerging Technologies and [12] J. Gutierrez, M. Naeve, E. Callaway, M. Factory Automation, 2003. Proceedings. ETFA'03, Bourgeois, V. Mitter, and B. Heile, vol. 1, 2003. IEEE 802.15. 4: a developing standard for low‐ power low‐costwireless personal area networks," [5] M. Safety and H. A. (MSHA), IEEE Network, vol. 15, no. 5, pp. 12‐19, 2001. \METAL/NONMETAL DAILY FATALITY REPORT – Year End 2008." [13] D. Russell, Superposition of waves." http://www.msha.gov/stats/charts/mnm2008year http://paws.kettering.edu/~drussell/Demos/super end.asp, 2009. position/superposition.html.

[6] M. Safety and H. A. (MSHA), COAL DAILY [14] S. Siegel, Nonparametric statistics," American FATALITY REPORT – Year End 2008." Statistician, pp. 13‐19,1957. http://www.msha.gov/stats/charts/coal2008yeare nd.asp, 2009 [15] D. Tauritz, Decision Tree for selecting appropriate statistical test for comparing the means [7] D. of Mining Engineering, Experimental Mine." of the results of two stochastic algorithms ." http://mining.mst.edu/research/depexpmine.html, http://web.mst.edu/~tauritzd/courses/ec/cs348fs 2008. 2009/StatTest.pdf.

[16] D. R. Smith and D. Schurig, Electromagnetic [8] C. Perkins and E. Royer, Ad‐hoc on‐demand wave propagation in media with indefinite distance vector routing," in proceedings of the 2nd permittivity and permeability tensors," Phys. Rev. Lett.,vol. 90, p. 077405, Feb 2003.

53 Leveraging Model Fusion to Improve Geophysical Models Omar Ochoa1 Faculty Mentor: Ann Gates1 Center of Excellence for SHaring resources for the Advancement of Research and Education through Cyberinfrastructure (Cyber-SHARE) 1Department of Computer Science University of Texas at El Paso El Paso TX 79968, USA Emails: [email protected], [email protected]

Abstract better accuracy and/or spatial resolution in other areas and depths. There are many sources of data for Earth tomography models. Currently, each of these datasets is processed The objective of this paper is to describe scenarios in separately, resulting in several different Earth models which a geophysicist can leverage the model fusion that have specific coverage areas, different spatial technique to improve the generated models of the Earth. resolutions, and varying degrees of accuracy. These models often provide complementary geophysical 2. Model Attributes information on Earth structure. An approach is to combine the information derived from these models by Geophysical models contain measurements that have fusing the Earth models coming from different datasets. not only different accuracy and coverage, but also In this paper, we describe different scenarios in which a different spatial resolution. These measurements are scientist can leverage the model fusion technique to inherently attributed to the technique used to generate improve Earth’s tomography models, and we discuss each model. For example, first-arrival active seismic the computer science research questions associated with experiments generate models of the Earth with a higher the project. coverage and a higher accuracy closer to the surface, while gravity data generates models of the Earth with a 1. Introduction lower resolution but a broader coverage. It is desirable for a geophysicist to generate a model of the Earth with One of the most important studies in Earth sciences is a high spatial resolution, complete coverage, and high the interior structure of the Earth. There are many accuracy. sources of data for Earth tomography models: first- arrival passive seismic data (from actual earthquakes) 3. Scenarios [5], first-arrival active seismic data (from seismic experiments) [2, 4], gravity data, and surface waves [6]. In this section, scenarios are used to describe how the Currently, each of these datasets is processed primary user, i.e., the geophysicist, utilizes the model separately, resulting in several different Earth models fusion technique to develop a better generation of a that have specific coverage areas, different spatial model of the Earth. resolutions, and varying degrees of accuracy. These models often provide complementary geophysical 3.1 Scenario 1: Identifying a weakness in a fused information on Earth structure (P-wave and S-wave model velocity structures), where combining the information derived from each requires a joint inversion approach. A geophysicist examines a fused model of a specific region of the Earth. The geophysicist determines that Designing such joint inversion techniques presents there is a need to improve in an area of interest of the important theoretical and practical challenges [3]. As a model. An area of interest refers a particular part of the first step, the notion of model fusion [1] provides a fused model in which there is low spatial resolution practical solution: to fuse the Earth models coming area, missing coverage area, or low accuracy area. The from different datasets. Since these Earth models have geophysicist acquires a model that exhibits an different areas of coverage, model fusion is especially improvement in the area of interest and proceeds to fuse important because some of the resulting models provide these two models. better accuracy and/or spatial resolution in certain spatial areas and depths, while other models provide a

54 Under this scenario, geophysicists continuously demonstrating validity of the model during the fusion improve the collective fused model by refining and process? complementing the data in areas of the model that they consider weak. What automated support can be provided to the geophysicists to help them design experiments that will 3.2 Scenario 2: Identifying a strong characteristic in lead to the collection of better source data? a fused model Planned future work involves the design and A geophysicist wants to design an experiment to gather implementation of a software tool that can aid the new data with a specific attribute in mind for a certain geoscientist in the creation of fused models. The region. This specific attribute could be high resolution, proposed tool will incorporate the model fusion with a high accuracy, or coverage at a specific depth. The graphical user interface to facilitate the scenarios geophysicist can examine a fused model of a similar illustrated in this paper. In addition, the work will region of the Earth and trace back to the origin of this investigate the use of formal specification of properties data. By doing so, the geophysicist can determine the to inform the scientist during the process of model best experiment settings for that type of data. fusion. Upon completion of the tool, the author will conduct an evaluation of the tool as it is used by In this scenario, geophysicists draw from the expert geophysicists. knowledge stored in the collective fused model and apply it to the design of a new experiment, thus 6. Conclusions incrementing the quality of the data gathered. This paper describes how a geophysicist can leverage 3.1 Scenario 3: Experimenting with different models the model fusion technique to improve Earth models. to be fused Three scenarios are described in this paper: identifying a weakness in a fused model, identifying a strong A geophysicist identifies an interesting area on a fused characteristic in a fused model, and experimenting with model. The geophysicist investigates that area or different models to be fused. In the first scenario, a feature by fusing different combinations of models that geophysicist can identify what data to acquire to create contribute to the model. The geophysicist analyzes and a model that will complement a weakness in the fused explores the resulting fused model. The geophysicist model. In the second scenario, a geoscientist identifies a might disregard one or two model characteristics to good measurement characteristic in a model and traces focus on the area of interest. back to the origin of this data and in turn, learns what type of technique generates a strong measurement Under this scenario, geophysicists explore the fused characteristic in the resulting fused model. In the third model and isolate interesting areas or features by model scenario, a geoscientist experiments and analyzes fusing only those models that contribute to the area. different combinations of models to be fused in order to investigate an interesting area. These three scenarios 4. Current Status and Future Work illustrate how a geophysicist can improve the generation of models of the interior structure of the The algorithm and formulas that compose model fusion Earth. [1] are in the process of being implemented as a prototype. This prototype will be used to analyze the 7. Acknowledgements resulting fused model and, if necessary, to modify the approach. This work is funded by the National Science The research questions that are associated with this Foundation grants: Cyber-Share Center of Excellence: study are as follows: A Center for Sharing Cyber-resources to Advance Science and Education (HRD-0734825), and How can the techniques and methods of the iterative Computing Alliance of Hispanic-Serving Institutions design approach used in software development be CAHSI (CNS-0540592). applied to improving tomography models of the Earth? References What properties related to fused tomography models are of interest to geophyscisists with respect to 1. Ochoa, O., A. A. Velasco, C. Servin, V. Kreinovich. “Model Fusion under Probabilistic and Interval Uncertainty, with application to Earth

55 Sciences.” in Proceedings of the 4th International 4. Hole, J. A. “Nonlinear High-Resolution Three- Workshop on Reliable Engineering Computing Dimensional Seismic Travel Time Tomography.” REC’2010, Singapore, March 3-5, 2010, pp 81- Journal of Geophysical Research, 97:6553-6562, 100. 1992. 2. Averill, M. G. A Lithospheric Investigation of the 5. Lees, J.M and R. S. Crosson, “Tomographic Southern Rio Grande Rift, University of Texas at Inversion for Three-Dimensional Velocity El Paso, Department of Geological Sciences, PhD Structure at Mount St. Helens Using Earthquake Dissertation, 2007. Data.” Journal of Geophysical Research, 94:5716- 3. Averill, M. G., K. C. Miller, G. R. Keller, V. 5728, 1989 Kreinovich, R. Araiza, and S. A. Starks. “Using 6. Maceira, M., S. R. Taylor, C. J. Ammon, X. Yang, Expert Knowledge in Solving the Seismic Inverse and A. A. Velasco, “High-resolution Rayleigh Problem.” International Journal of Approximate Wave Slowness Tomography of Central Asia.” Reasoning, 453:564-578, 2007. Journal of Geophysical Research, Vol. 110, paper B06304, 2005.

56 Materialography Using Image Processing with OpenCV and C#

Oscar Garcia1, Angelo Reyes2 1 Digital Signal Processing Laboratory, Electrical and Computer Engineering Department, University of Puerto Rico, Mayagüez PR. 2 Engineering Physics, School of Natural and Educational Sciences, University of Cauca, Popayan - Colombia. Emails: [email protected], [email protected]

Abstract through image analysis. In general, images of the materials are obtained through a microscopy technique This paper discusses the design and that is selected depending on the material under study implementation of an algorithm for materialographic [2]. analysis and quality control in materials science, The image analysis was initially performed through study of images obtained mainly by SEM, manually, but with the improvement of information TEM, AFM and Optics microscopy. The system was systems more accurate methods have been developed developed using Visual C# and OpenCV library. The and consequently, reducing time of analysis [1, 3]. images used correspond to the surface of several Materialography deals with the extraction of material’s structure features, creating a link between the materials such as thin films, polycrystalline ceramics and puma hairs among others. The system uses digital structure, properties and its applications [3]. image processing to determine the densitometric, field If the structure of the material lies in the and object parameters. In this work, particle sizes are macroscopic scale, i.e. geotechnical engineering field, calculated using a proprietary methodology that allows Optical Microscopy is usually employed. This can be for more accurate results as compared with traditional classified as stereoscopic, of fluorescence or bright methods of intersection and planimetry. field. If the structure under study lies in the mesoscopic Index Terms: Materialographic analysis, Visual scale, i.e. electro-ceramics or cells, High Resolution C#, OpenCV, digital image processing. Optical Microscopy or Electron Microscopy must be used. The latter can be classified into Scanning Electron Microscopy (SEM) or Transmission Electron 1. Introduction Microscopy (TEM). Finally if we are dealing with materials in the nanometric scale, i.e. thin films, it is The main objective in the field of materials science recommended to use techniques such as Atomic Force and engineering; is to investigate and set the basis for Microscopy (AFM) [1]. the relationship between a material’s structure and their Materialograpic techniques turn out to be very properties in order to design a material with a appealing to researchers. Among the many advantages predetermined set of properties for a specific of materialography applied to methods of application [1]. For the process to have the appropriate characterization are: low-cost nondestructive analysis, controls, it must to employ different methods of fast acquisition of data, ease on preparing the samples characterization. and multi-property estimation by analyzing only one There are several methods of characterization that image. allow us to know the physical, chemical and biological properties of the material under study. These methods can be classified into destructive and non destructive 2. Methods (i.e. Materialograhic Analysis, Nuclear Magnetic Resonance Spectroscopy (NMR), High Performance 2.1 Image Analysis procedure Liquid Chromatography (HPLC), Ramman Spectroscopy and Pin on Disk). It should be noted that The proposed procedure for image analysis is these techniques are used in very specific applications. applied to study the structure of materials as follows: Among the aforementioned tools, Materialographic (1) Proper sample preparation. Although this stage analysis has proven noteworthy. This technique consists does not belong to the image processing itself, it is in determining the properties of the material of interest important in the analysis because the image is the

57 source of the required information. A good image shall pixels are in the background or if they are part of an provide reliable data. The images cannot have shadows, edge, eliminating the noise while preserving edges [6]. dents or loose material. As these imperfections in the The developed filter examines the image in a pixel images, may lead the system into an erroneous by pixel basis to determine if the local pixel belongs to interpretation with false pores or edges. For example, the surface of the object of interest, to an edge or it is when working with ceramic systems (non-conductive noise. If the local pixel corresponds to noise, it is materials in general) a metallic coating should be replaced by the desired value. On the other hand, if it applied to the sample (usually gold), to ensure enough belongs to areas of interest, values of contrast and conductivity in the sample. Then, a good image using brightness are increased. The filter has a filter factor or the SEM (Electron Microscope Scanning) can be selectivity, which makes the filter handle to a greater or obtained. Failure in considering these recommendations lesser extent. Furthermore, the filter has three modes of will result in problems such as saturation of the image sharpening of edges: normal, medium and high in regions poorly covered. according to the characteristics of the image. The (2) Pre-processing and image processing. Once a sharper the edges and the bigger the difference between good image has been obtained, the next step is the those and the background, the better the detection. image processing itself. This is the most important It should be noted that in certain applications, such stage in the extraction system, because its output as the study of electron tunneling in a varistor, border- isolates the particles to be measured during the image sizing must be done carefully because this is the main analysis. The better the processing of the input image, parameter of interest. the closer the extracted information will be to the ideal Once the base image is filtered, it is necessary to [4]. The goal of image processing is to achieve the best segment the edges. An algorithm that models the image possible location of the edges delimiting the areas of as a random variable was designed, with statistical interest in the image. The accuracy of the system moments of a pixel of interest around a window being depends on these results. Taking these observations into classified either as a border pixel or background-image consideration, a study was carried out in order to select pixel [4]. The detector can be set in different an edge detector that provides the best performance, configurations depending on the characteristics of the considering different attributes in the images. In this edges to be segmented. If in the worst case scenario research, detection based on statistical moments was some edges are broken, they can be rebuilt manually. chosen, due to ease in: coding, computation, detection Once this is done, the segmentation is refined by and thresholding. using mathematical morphology [7], drawing the It is important to point out that good edge detection borders of objects of the size of a pixel automatically, rely on a previous filtering. That way, false positives linking broken edges and removing information of little (due to noise or imperfections in the sample interest after the segmentation. The result is an image preparation) can be avoided. Given the importance of where objects are differentiated by their borders and removing noise while preserving edges, it was decided ready to be analyzed one by one. to use relatively advanced filtering methods, such as • Measurement Descriptors used in this work nonlinear filtering. [8,9] Once the base image is filtered, the boundaries of grains and pores are detected; then every particle of The measurements given by the system are interest is extracted, except those that "touch" the edge classified as: of the image (incomplete particles), since we have total Densitometer. Making an abstraction of this optical uncertainty about what is outside the image field. term, “Densitometer” refers to the measurement of (3) Image analysis. At this stage the particles are lighting quantities (e.g. brightness of the image) of treated as individual objects and its measurements objects by establishing a relationship between the parameters are extracted according to the purpose of brightness of an object and some physical, chemical or research [4]. Finally, the extracted parameters or biological property. For example in an image of an descriptors of the measurement are presented in MS- atomic force microscopy, the gray level of a pixel Excel spreadsheets for further analysis by the user. represents the surface’s topography of the sample. Field measurements. The descriptors reported by 2.2 Implementation the system are distribution of grain and pore, object’s • Nonlinear Filtering and relevant edge counting, analyzed area, the particle density and the detection [5]. average size of the objects analyzed in the field of interest in the image. The importance of using non-linear filters lies in Object measures. The form factors given by the the fact that these self-adjust depending on whether the system are: porosity content (area and pore size), particle area, eccentricity, roundness factor, content of

58 porosity within the particle and particle size. In this can be called from most of the common development paper, we define the particle size as the average length platforms such as Dev-C, Matlab, Visual C++, C#, of n axes passing through the center of mass, bounded Linux, Mac OS X and Python. The advantage of using by the contour of each analyzed object. The major and OpenCV for image processing is that better minor are respectively, the length of the longest and management of information can be achieved by using shortest straight line passing through the center of mass, less time and memory. bounded between two points on the boundary. The former is also known as a diameter. 3. Results To obtain a percentage of error below 5%, it was found that six well-distributed axes between the major • Detection level and presentation of the analyzed and minor axes are needed. Therefore, the measurement particle algorithm ensures a reliable measure of particle size, taking into account that the particle has been well The material shown in Figure 2 was used to evaluate segmented. the efficiency of the filter and edge detector, based on Because of the digitization, the measurements of the accuracy of the system to trace borders of grains. In length should be corrected using calibration factors the left side of the Figure 2 it is seen the detection level depending on the angle of inclination of these axes. For of the system and the suppression of incomplete example, when measuring a distance between two particles in the image. On the right side, an "index" (the pixels that form a straight line with unit slope, the numbers in red) is assigned to each analyzed particle. correction factor is equal to √2. If the slope of the line is 0, the correction factor is equal to 1. Figure 1 shows graphically the method used for measuring the particle size, which assumes the object under study having a length equal the average of several axes. Figure 1.a shows the axes’ arrangement. Figure 1.b shows the circles corresponding to three axes or particles that model the object under study.

Figure 2: Detection Level and presentation of the analyzed particles. This image corresponds to ceramic material SnO2. Source: University of Cauca, Colombia.

• Some applications of the system

Figure 3 shows some materials evaluated with this system. a) corresponds to an image taken with a high- Figure1: Designed method to measure grain size. a) Axes’ resolution optical microscopy of an Andean deer hair in arrangement. b) Another way to understand the model graphically. order to determine their taxonomic classification through analysis of the structure present in the image. • Codification b) corresponds to an image used to measure the degree To developed the code, authors used Microsoft of porosity in underground rocks, to study the storage Visual C # [10] in conjunction with the OpenCV library of petroleum and natural gas, the particle size and [11]. The authors chose this platform because the C# geological studies in general. c) corresponds to an AFM programming platform allows to design user interfaces image of a nano-scale V/VN multilayer; from its relatively easy, they can then be loaded in operating morphological analysis it is possible to obtain the systems, such as Windows XP or Vista. One of the material’s tribological properties such as hardness and advantages of working with C # is that there is a abrasion. d) is an image taken with a stereoscope and worldwide community sharing developments and corresponds to the fillers used to modify the properties experiences, waiting to be reused. In this work, C # was of the asphalt mixtures. In this case, the parameters of used to perform human-machine interface (Shell), while interest are the shape factor and eccentricity because the for the algorithms of optical-digital image processing, adhesion and the friction coefficient in a road are highly OpenCV was used. affected for such as parameters. OpenCV, a library designed by Intel specifically for image and video processing applicaitons. OpenCV

59 3. ASTM Technical Publication 839. Practical aplications of quantitative metallography (internacional metallograph society, 1982). 4. R. Gonzalez, R. Woods, Digital Image Processing, 3rd Ed. (Prentice Hall, 2008.) 5. A.N. Venetsanopoulos & I. Pitas, Nonlinear Digital Filters (Kluwer Academic, 1990). 6. H. R Myler & A. Weeks, Computer Imaging in C (Prentice Hall, 1993). 7. W. Pratt, Digital Image Processing, 4th Ed (Wiley, 2007). 8. L. Wojnar. Image Analysis: applications in Materials Engineering (CRC Press, 1999). 9. J. Pertusa., Techniques of image processing (according to its title in Spanish) (University of Valencia, 1993). ® ® Figure3. Some example of images analyzed with the system. Source: 10. J. Sharp, Microsoft Visual C# 2005 Step by Step University of Cauca, Colombia. (Microsoft 2005) 11. G. Bradski and A. Kaehler, Learning OpenCV 4. Conclusions (O’Reilly 2008).

The developed system provides a method to quantify the structural parameters of a material, linking the data obtained with some property of interest. Since matter exists in similar forms at different scales, this work can be extended to various areas of scientific research. Today’s research in materials requires the integration of tools that provides precise measurements; in addition to that, the trend in industry is to design more efficient and cleaner manufacturing methods. Furthermore, traditional analysis methods are often limited and highly subjective, hence our interest in developing a user-friendly system that provides accurate measurements in reasonable times of analysis. The interface assists the user through the entire analysis process, eliminating the need to be an expert in digital signal processing.

5. Acknowledgement

This work has been supported by NFS Grant No. CNS- 0837556.

References

1. W. Callister, Materials Science and Engineering (John Wiley & Sons, 2007). 2. C. Carter & M. Norton, Ceramic Materials Science and Engineering (Springer, 2007).

60 Network Security: A Focus in the IEEE 802.11 Protocols Juan M. Monge Arroyo#1, Brendaliz Román Cardona*2, Jan Flores Guzmán#3 #Electrical & Computer Engineering and Computer Science Department Polytechnic University of Puerto Rico Alfredo Cruz, Ph.D.

Abstract— High impact vulnerabilities are discovered and Encryption is divided into two categories: symmetric and exploited when something becomes the standard of the industry. asymmetric. In the symmetric type, all devices will use the The Institute of Electrical and Electronics Engineers same key for encryption and decryption. The security of the (IEEE) has some standards for data communications named the symmetric method is completely dependent on how well users IEEE 802 standards. This paper will explain some of those protect the key [1]. See the figure 1 represent the Symmetric standards focusing in the 802.11 (wireless) protocols. Wireless encryption uses the same keys. Local Area Networks (WLANs) use air as the medium of data transfer, meaning that they can be intercepted without physical attacks. Interception methods or attacks like denial of service and man in the middle attacks are frequently used to intercept data or to even stop or deny the functioning of a computer resource. Most of the IEEE 802 standards use encryption methods to protect the data being transmitted. The WLAN security standards to be discussed in this paper are 802.11, 802.1x, and 802.11i.

Keywords— Network Security, Wireless LAN Security, IEEE802.11. Fig. 1 Symmetric encryption I. INTRODUCTION

Local Area Networks (LANs) and Wireless Local Area Networks (WLANs) succeeded in providing network access to In the asymmetric encryption, the public key can be known to computers using guided or unguided transmission methods. everyone, and the private key must be only known to the The Institute of Electrical and Electronics Engineering (IEEE) owner. Each key type can be used for encryption and has a set of standards and specifications for data decryption. With asymmetric, the user’s private access to the communications in wired and wireless networks. When data is secret identifying information and acts as a signing key. The transmitted through non guided mediums like air, the user’s public verification key is stored on the server. interception becomes a possible threat. Some common attacks Identification is confirmed when the user signs on a random threatening wireless communications are smurfing, distributed string sent by the server, using the appropriate private key. denial-of-service, spoofing and others. The 802.11 protocol See figure 2 represent Asymmetric use two different keys for defined the Wired Equivalent Privacy (WEP) protocol to encryption and decryption purposes. protect wireless communications. Unfortunately, WEP could not deliver wireless secure communications. Other security standards were created to address the problems WEP could not solve. Some of those standards are 802.11, 802.1x, 802.11i and others.

II. ENCRYPTION

A. Encryption Security is the protection of data that is transmitted through unguided mediums like air. Encryption methods are Fig. 2 Asymmetric Systems used to protect the data being transmitted. Encryption solves the following problems of information security: authenticity, integrity, confidentiality, and non-repudiation. Encryption If the private key is used for encryption, it cannot be used for works using special keys to protect and unprotect data. decryption and vice versa.

61 Hash functions ensure that the data being transmitted B. Smurfing Attacks arrives intact. Some of the uses of hash functions are: unique Smurfing is an attack that uses the ICMP protocol to file identification, data corruption detection, and others. As overwhelm the intended host with a large amount of echo long as a secure hash function is used, there is no way to take replies. To accomplish this, the attacker sends a large number someone's signature from one file and attach it to another, or of ICMP echo requests with the spoofed source address of the to alter a signed message in any way. The slightest change in a victim to broadcast addresses within smurf amplifier networks. signed document will cause the digital signature verification The smurf amplifier networks proceed to send the ICMP echo process to fail [1]. See figure 3 below for the better request to all the hosts in the local network. Once every host understanding of hash functions. in the local smurf amplifier network receives the ICMP echo request with the spoofed address of the victim, each one proceeds to reply to the victim with an ICMP echo reply. The large number of echo replies sent to the victim can easily overwhelm the system, causing the victim to suffer the effect of a denial of service. Near the end of the 1990s, the protocol standards were updated. The update resolved this issue in two ways: configure individual hosts and routers so that they do not respond to ping requests or broadcasts [2], and configure routers so that they do not forward packets directed to broadcast addresses by default. This effectively prevents the network from being exploited to attack. Since that time, the frequency with which this type of attack is used has dramatically decreased.

C. Distributed Denial-of-Service Attacks Fig. 3 Hash Functions common operation In distributed denial-of-service attacks (DDoS), the III. NETWORK ATTACKS attackers compromise a large number of hosts over some length of time before the actual attack [3]. These hosts are The internet was created for information exchange sometimes collectively known as a botnet. A botnet can be between trusted hosts in universities. Initially, it allowed the defined as follows: A collection of compromised computers sharing of information between universities and its researchers (generally called zombie computers) running software in a cooperative environment. As such, security did not weigh (usually installed via drive-by downloads exploiting web in heavily in the initial design of the Internet Protocol Suite browser vulnerabilities, worms, Trojan horses, or backdoors) [2]. In today’s world, networks and the internet have emerged under a common command-and-control infrastructure. as a critical, evolving vehicle for global communication and Once attackers have established a botnet, they are able to commerce. As we rely more and more on network coordinate the software running on the zombie hosts to launch communications, this lack of inherent security has become an a simultaneous attack on a victim. The attack typically increasingly significant concern as attacks and exploitations consists of bombarding the victim with a large number of have evidenced this fact. Some common network attacks are: messages such as packets and other network traffic from SYN flooding, smurfing, denial of service, spoofing attacks multiple, numerous zombie hosts. and others. This kind of attack is relatively novel and thus, still under A. SYN Flooding Attacks research. It has shown to offer attackers several benefits over more traditional attacks. This can actually make this kind of To establish a connection, TCP performs what is known attack particularly difficult to prevent. Some work has been as a three-way handshake. Clients and servers used special done with ICMP traceback messages to help trace flooding synchronization/acknowledgement (SYN/ACK) packets to attacks back to the source hosts. Although this is helpful in establish a connection. The SYN flooding attack consists of sending a large number of SYN packets without detecting the source of the attacks, it does little in the way of acknowledging any of them. The victim accumulates each actually preventing it. SYN packet awaiting acknowledgement, until the software D. Spoofing Attacks could not hold anymore and either crashed the system or In a spoofing attack, the attacker is able to assume the locked out network access. The victim suffered a denial of identity of another in a communication by falsifying service as a result. information. There are many kinds of attacks that fall into this The solution for this attack was the use of a SYNcookie category. One of the common spoofing attacks exploits the [2]. A SYNcookie is simply a random number added to use of the address resolution protocol (ARP), a widespread SYN/ACK packets with the intention of replacing the need to protocol to associate a host’s physical MAC address to its keep the copies of SYN packets. That way, it’s not required to current IP address. keep information about half-open sessions.

62 In an ARP spoof attack, the attacker intends to associate his own MAC address with the IP address of the victim, typically a default gateway. If the attack succeeds, then all traffic that was meant for the victim will be sent to the attacker. The attacker may then either forward the traffic to the victim (passive sniffing), or alter the information before forwarding it (man-in-the-middle attack). Another possibility for the attacker would be to associate the victim’s default gateway IP with a non-existent MAC address, effectively inflicting the user to a denial of service attack. The attacker accomplishes this by sending unsolicited

ARP responses to the victims with fake information. If the victim accepts and caches the information, this effectively Fig. 4 Shared Key Authentication Steps of 802.11 poisons their ARP cache, causing an unintended association between a legitimate IP address and the MAC address of the WEP protocol can be divided in two stages. In the first attacker. stage, a cyclic redundancy check (CRC) is made to the message and then is concatenated at the end of it. After the IV. WIRELESS LAN SECURITY STANDARDS CRC, an IV is selected. In the other stage of the process, the RC4 algorithm generates the WEP encryption key using the The wireless security goal is to prevent the unauthorized IV and the shared key [7]. Finally, the key stream generated is access to any type of system that uses wireless networks. XORed with the message to generate the encrypted message IEEE created three basic technologies for client authentication (cipher text) to be sent. Before sending, the IV is concatenated and protection. The technologies are: open system to the encrypted message. See figure 5 below. authentication, shared key authentication, and WEP. Later on, IEEE created the more secure IEEE802.11i improving the authentication process. The IEEE802.11i also protects the privacy and integrity of the data transmitted [4]. A. IEEE 802.11 There are two authorization methods defined in the IEEE802.11: open system and shared key. The open system authorization only requires the Service Set Identification (SSID) from the wireless station for authentication. This authentication method is totally unprotected because most APs broadcast the SSID. In the shared key authentication, the AP sends to the client a challenge text packet that the wireless station must encrypt with the correct WEP key and return it to the AP. If the client has the wrong key or no key, authentication will fail and the client will not be allowed to Fig. 5 Wired Equivalent Privacy (WEP) encryption algorithm associate with the AP [5]. See figure 4 below. The shared authentication is more exposing than the open authentication In the decryption process of WEP, the RC4 take the IV because in the first step of the process, the AP sends the from the encrypted message and the shared key to re-create correct WEP key to the client to associate. This WEP key can the WEP encryption key. Finally, the key stream is XORed be intercepted and cracked. with the encrypted message to create the original message [8]. WEP is a security protocol created to secure The only difference between the encryption and decryption is communications between the AP and the wireless station. It that the message to be XORed in the decryption process is can use the open system or the shared key authorizations. encrypted compared to the unencrypted message XORed in WEP is based on a stream cipher encryption symmetric key the encryption. The XOR operation simply yields true if two algorithm called RC4. Stream cipher is an encryption method bits are different, and false if they are equal. To technically where each bit of the data is sequentially encrypted using one understand why the process is the same whether the message bit of the key. It uses an initialization vector (IV) to produce a is encrypted or unencrypted. unique stream independent from other streams produced by The WEP security protocol has some major flaws. One of the same encryption key [6]. those flaws is that the 802.11 standard does not state how the keys are going to be distributed, so in most cases every client in an AP uses the same key. Other major flaw is that there is a direct relationship between the WEP encryption keys and the IV used in a single session, so it is easy for an attacker who knows the WEP encryption key to capture the corresponding

63 IV [8]. Other important issue is that the key length is too short message and the tag are sent to the receiver to use a (40 bits). This was resolved in an update called WEP2. The verification procedure [7]. This algorithm is an upgrade over key was updated to 104 bits. the old CRC method because it has a counter that minimizes The TKIP is an improvement of the WEP protocol. It is recurrent attacks. It also protects the data and the header. an upgrade from WEP because it has longer key and IV length The Robust Secure/Security Network (RSN) provides a [7]. It is also better because it don’t have a direct relationship function to exchange security methods between the AP and between the IV and the WEP encryption key. It also client [7]. It uses RSM IE (Information Element) frames to eliminates the WEP key recovery attacks due to the key exchange this information. The RSN increases the flexibility mixing function. It is an upgrade but it has flaws. TKIP is still of the network and provide options to define the security based in the RC4 encryption. TKIP affect the network’s policies implemented in organizations. overall performance due to the amount of encryptions/decryptions it executes [4]. V. CONCLUSION

B. IEEE 802.1x Wireless networks will be always more susceptible to The IEEE802.1x is a protocol providing mutual attacks compared to wired networks. Most of the attacks can authentication and efficient key exchange between clients. be avoided or treated by using the correct process and protocol. This standard is based on supplicant, authenticator, and At first, the direction of wireless networks was to connect authentication server. In WLANs, the supplicant is the wireless devices to wired networks. After some security wireless station, the authenticator is an access node which problems, WEP was the first protocol to be used. After more allows wireless stations to access the network, and the problems occurred due to the poor protection, TKIP was authentication server is a server with authentication implemented. TKIP was a temporal solution due to the fact mechanisms [8]. that it was based in the WEP protocol. Finally 802.11i came, Authentication requests and replies are managed by the and solved most of the problems by implementing new Extensible Authentication Protocol (EAP). EAP encapsulates authentication methods and encryption algorithms. The the messages that travel from supplicants to the authenticator. 802.11i standard is very secure for now, but eventually new The IEEE802.1x also securely distributes encryption keys. problems will arise and a new standard will be needed. See figure 6 below. REFERENCES [1] CISSP. “CISSP All-in-One Certification Exam Guide.” 2007. [2] R. J. Anderson. “Security Engineering: A Guide to Building Dependable Distributed Systems.” 2008. http://www.cl.cam.ac.uk/~rja14/Papers/SE-18.pdf [3] M. Srivatsa, A. Iyengar, J. Yin, & L. Liu. “A Client-Transparent Approach to Defens Against Denial of Service Attacks.” 2006. [4] Y. Zahur, & T Yang. “Wireless Security and lab designs.” 2004. [5] Internet Security Systems. “Wireless LAN Security.” 2001. [6] N. Borisov, I. Goldberg, & D. Wagner. “Intercepting Mobile Fig. 6 EAP message flow that occurs during 802.1x authentication Communications: The Insecurity of 802.11.” 2001. C. IEEE 802.11i [7] A. M. Al Naamany, A. Al Shidhani, & H. Bourdoucen. “IEEE 802.11 The IEEE802.11i is another standard created to improve Wireless LAN Security.” 2006. authentication, integrity and data transfer. It was created to [8] J. C. Mitchell. (2005) “Security Analysis and Improvements for IEEE solve the main problems of the WEP and TKIP. WPA2 is the 802.11i.” 2005. protocol based in this standard. The 802.11i is capable of two http://www.isoc.org/isoc/conferences/ndss/05/proceedings/papers/NDS methods of authentication: 1) 802.1x and EAP to authenticate S05-1107.pdf the users, 2) per-session key per-device. The second method has a shared key called Group Master Key (GMK) and it is used to derive the other. GMK is used to derive Pair Transient Key (PTK) and Pair Session Key (PSK) to do the encryption and authentication [5]. A new algorithm called Michael is used in the IEEE802.11i to calculate an 8 byte integrity check called Message Integrity Code (MIC). Michael produces a special tag using a 64 bit authentication key and the message to be encrypted as inputs [7]. The method for creating the special tag uses XOR operations, bit swapping and addition. The

64 A Specification and Pattern System to Capture Scientific Sensor Data Properties

Irbis Gallegos1 Mentors: Ann Gates2, Craig Tweedie2 Center of Excellence for Sharing resources for the Advancement of Research and Education through Cyberinfrastructure The University of Texas at El Paso, El Paso TX 79936, USA Emails: [email protected] , 2{agates, ctweedie}@utep.edu

Abstract limitation with this practice is that a large amount of incorrect data can be undetected for extended periods of To conduct research on the causes of global time, resulting in datasets that have a of correct and environmental changes, environmental scientists use incorrect data. The incorrect data is often not identified advanced technologies such as wireless sensor networks until the field scientists return to the laboratory. As a and robotic trams equipped with sensors to collect data, result, the data gathering process has to be repeated. e.g., spectral readings, ground temperature, ground Repeating a data gathering process is expensive. moisture, wind velocity, and light spectrum. Indeed, Sensors need to be redeployed and possibly the amount of data being collected is rapidly increasing, recalibrated; the amount of time required to gather the and the ability to promptly evaluate the accuracy of the data can be large and for time sensitive data required data and the correct operation of the instrumentation is for policy decision-making it might not be even critical in order to not lose valuable time and possible to repeat the data gathering process. Since information. An approach based on software- environmental science sensor data are non-reproducible engineering techniques is being developed to support entities, i.e., the observation at a given time and set of the scientist’s ability to specify data properties that can conditions can only be taken once, the knowledge that be used for near real-time monitoring of data streams. could have been obtained from the data is lost and This paper describes the Data Property Specification unrecoverable unless a redundant system has been (DaProS) tool used to specify software properties based deployed. The importance of data to the study of the on patterns and decision trees. The initial findings from environment and their value to society emphasizes the using the DaProS tool to specify data properties from need to develop mechanisms and procedures to verify different environmental science projects includes the the integrity of the data in near real-time. identification of several factors that can limit the The process to ensure the quality of data can be effectiveness of data properties in scientific divided into two stages, a property specification stage community’s practices. and a verification stage. In the property specification stage, a practitioner specifies a set of properties that can 1. Introduction be used to check the quality of the data. In the verification stage, a mechanism or system checks that

To study current trends and changes in the Earth’s the data adheres to the specified properties. The focus climate, environmental scientists have begun using of this work is on property specification. Indeed, the advanced sensor technology such as meteorological quality of data verification is as good as the quality of towers, wireless sensor networks, and robotic trams the properties specified. equipped with sensors to perform data collection at A goal of this research is to define an environmental remote research sites. As the amount of data collection scientist-centered approach for specifying data instruments introduced in the field increases, so does properties based on temporal and data relationships the amount of environmental sensor data acquired in associated with sensor data. Specifically, there is an real time by such instruments. The instrumentation interest on determining how a specification pattern typically does not include mechanisms to check the system approach to specification can be applied to quality of the acquired data in real time. A common improve sensor data error detection. practice for environmental scientists is to collect sensor The importance of data to the study of environmental data for extended periods of time and verify that the sciences emphasizes the increased need to define data adhere to predefined data quality standards once processes for verifying the integrity of the data and to the data are about to be stored in the database. The develop mechanisms that can provide scientists

65 assurance at near real time concerning the correctness A data pattern specification system was developed to of the data obtained by field-based instruments. For address the limitations of current practice and to assist example, corrupted sensor data can cause the scientist in specifying data properties. Patterns miscalculations that could have a major impact in provide templates of properties that are commonly environmental policies. used. Data properties generated using the specification Examples of major events derived from bad data and pattern system can be translated or interpreted for include the Ozone Mapping Spectrometer (TOMS) on use by data verification mechanisms. The specification board of the Nimbus-7 satellite that failed to detect the and pattern system has been designed to be Earth’s stratospheric ozone depletion in some areas [1], implementation agnostic. As a result, scientists can or the U.S National Snow and Ice Data Center project refine existing data properties, or can specify new data underestimation for weeks of the extent of Arctic sea properties, using data entities available from the data or ice by five hundred thousand square kilometers due to metadata being collected. an undetected sensor drift [2]. In addition, a recent The Dwyer et al. [4] Specification and Pattern study to determine the quality of the temperature System (SPS) was developed to facilitate the measurements obtained from the 1,221 continental specification of critical software properties; however, United States climate-monitoring stations overseen by the SPS is unable to capture timing-based requirements. the National Weather Service showed that only 11 Konrad and Cheng [5] addressed this limitation by percent of surveyed stations are of acceptable quality. extending the SPS to specify real-time properties; The errors in the record exceeded, by a wide margin, neither the SPS, nor the work by Konrad and Cheng, the purported rise in temperature of 0.7ºC (about 1.2ºF) allow the specification of data properties. The proposed during the twentieth century [3]. The proposed solution work uses and modifies the concepts from both the addresses such needed data integrity by providing a original SPS and the real-time extension to build a data means to specify data properties for improving error property specification and pattern system. detection in scientific sensor data. In SPS, properties are of type Boolean and are verified against execution traces. Patterns and scope, 2. Proposed Solution i.e., the state of execution over which a property holds, are strictly coupled to Boolean values. For DA-SPS, 2.1 Data Specification and Pattern System properties are evaluated over a set of data values instead of program execution states. Scientists need the means to specify error detection The main difference between the DA-SPS and SPS is properties for field-based data gathering mechanisms the semantics of the scope and pattern definitions in without the need of having an extensive technical both approaches. In DA-SPS, the scopes are defined in background. A frequent practice is for scientists to terms of datasets and dataset values. Global, Before R, embed error detection properties into the system’s and After L scopes present no major semantic source codes. The main drawback for this approach is differences. Global denotes the entire dataset; Before R that the error detection properties are limited by the denotes a sequence of data values that begin with the system’s implementation and what the scientist had in start of the dataset and ends with the data value mind in terms of error detection properties when the immediately preceding the data value at which R occurs system was conceived. Further changes to existing error for the first time in the dataset; and After L denotes a detecting properties and the addition of new properties subsequence of data values that begins with the first require a code developer to modify the existing source data value at which L holds and ends with the last data code. Due to the frequent changes in scientific value of the dataset. In DA-SPS, Between L and R procedures and the large amount of field experiments denotes a subsequence of data values that begins when variables, there is a need for a dynamic way to specify L occurs and ends with the data value immediately error detection properties. preceding the data value at which R occurs; and After L Another common practice is for scientists to use until R denotes a subsequence of data values (an natural language to specify data properties. The interval) that begins when L occurs and ends either with ambiguity of natural languages, however, might cause the data value immediately preceding the data value at properties to be misinterpreted by other scientists and which R occurs, or begins when L occurs and ends with data verification systems. An alternative to specify data the termination of the dataset. DA-SPS assumes that the properties while mitigating ambiguity is for the scientist dataset contains unique elements since the time stamp is to specify the properties using some sort of formalism included in each element of the dataset. or mathematical logic. A drawback is that scientists DA-SPS pattern definitions differ from SPS pattern may not have the background or the desire time to learn definitions. While the SPS uses events and conditions new formalisms or mathematical logics. to delimit and verify properties, DA-SPS uses Boolean functions. In DA-SPS Boolean functions are applied to

66 the data subsets delimited by the property scopes. The LessThanOrEqual function compares two double verification is then performed based on the results of values, a and b, and returns true if a is less than or evaluating the Boolean function. equal to b, false otherwise. The high level specification In addition, the original SPS does not include timing using DA-SPS will look like: information. Similar to the work by Konrad and Cheng, the DA-SPS makes the distinction between timed Scope: Between 06:15:00 AM and 07:59:59 PM patterns and qualitative patterns. Qualitative patterns Pattern: Universality specify qualitative properties about the data that do not LessThanOrEqual(Temperature_Reading,79ºF) capture quantitative timed behavior. The Qualitative patterns are: Absence, Universality, Existence, The resulting DA-SPS specification can be directly Precedence and Response. Timed patterns capture data translated and verified by a data verification behavior that is timed dependent. mechanism. Timed patterns that capture data properties which bound the duration of an occurrence are referred to as 2.2 Tool Support Duration; those that address periodic occurrences are referred to as Periodic; and those that place time A Data Property Specification prototype tool bounds on the order of two occurrences are referred to (DaProS) was developed to assist practitioners specify as Order. The Timed patterns are further classified as and refine data properties that can be verified by data follows: Duration patterns--Minimum Duration and verification tools. The tool uses the DA-SPS described Maximum Duration; Periodic pattern--Bounded briefly above and property categorization [7] to specify Recurrence; and Order patterns--Bounded Response the data properties. and Bounded Invariance. Timed patterns are interpreted The categorization divides data properties into two over a discrete time domain, such as ℕ. Timed patterns major types, experimental readings and experimental assume a system clock; the clock tick is global, but it is conditions. Experimental reading properties specify treated as a local entity at each data value. For timed expected values and relationships related to field data patterns, the independent value associated to each data readings and, by definition, can be used to capture value in the dataset is assumed to be a discrete time random data errors. Experimental condition properties value. specify expected instrument functioning and To illustrate the strengths of the DA-SPS, consider relationships by defining instrument behavior. By the following natural language data properties obtained definition, these types of properties can capture from a literature review, which was conducted to systematic errors. Each major category is further identify scientific projects with published scientific data subdivided into more specific categories. properties that could benefit from a data property DaProS guides practitioners through a series of categorization, the type of analyzed data, and the type decision trees and questionnaires to specify and refine of data properties as defined by each project. The data properties. To validate the intended meaning of review identified 15 projects that collect environmental specified properties, DaProS uses disciplined natural data from sensors. language to generate natural language descriptions of specified properties. The disciplined natural language For May 12th at daytime, the dry bulb temperature descriptions are generated using a Backus-Naur Form should be less than or equal to 79.0 degrees Fahrenheit grammar representation of the data categories and the [6]. data patterns. The DaProS natural language representation of the The specification can be expressed using DA-SPS as property specification from Section 2.1 is as follows: follows. The data interval for the scope encompasses the data values recorded between 06:15:00 AM and For the dataset data enclosed by the data interval 07:59:59 PM according to average sunrise and sunset starting with 06:15:00 AM and ending with the datum times for May 12th over the last 10 years. A practitioner immediately prior to 08:00:00 PM , it is always the would use the Between L and R scope where, L case that dry bulb temperature<=79.0 holds. represents time entity 06:15:00 AM and R represents time entity 07:59:59 PM in this case. The practitioner The representation allows the practitioner to validate would use the following pre-defined function: the intended meaning of the specified property. The LessThanOrEqual(double a, double b) representation, in addition to the specification process, In addition, he or she would use the Universality pattern helps the scientist identify the subtleties of the property and apply it to every dry bulb temperature reading a in that were not captured by the original English the interval. Temperature reading a is compared to the description in the documentation. predefined value 79.0 degrees Fahrenheit. The

67 3. Preliminary Results literature review. The Specification and Pattern System (SPS) used in software engineering was adapted to DaProS was used to specify 529 data properties capture data properties, resulting in a new Data extracted from the projects included in the literature Specification and Pattern System (DA-SPS). In review. The practitioner using DaProS was not able to addition, the Data Property Specification (DaProS) tool capture three data properties that were stated at such a was developed to facilitate specification of data general level that they were ambiguous. properties associated with sensors. DaProS is based on During the specification process, the following the data property categorization and DA-SPS that was observations were made that must be considered to defined through this research effort; it uses decision ensure the effectiveness of the specification trees to guide practitioners through the data property categorization: specification process. The DaProS tool provides  Some data properties are described at such an disciplined natural language description of specified abstract level that it is difficult to translate such properties to allow practitioners to validate the intended specifications into data properties that can be meaning of the specification. Also, DaProS can export automatically verified. the generated properties to data verification tools.  Due to the inherent ambiguous nature of natural languages, data properties derived from written Acknowledgements specifications must be verified by an expert to ensure the intended property meaning is captured. This work is supported by National Science  Some data properties are complex, requiring them Foundation award # CNS-0540592, award # HRD- to be decomposed into several simpler properties. 0734825 and award # CNS-0837556.  A number of specifications are a combination of data verification and data steering properties. References  Combined property specifications require both verifying that the properties adhere to predefined 1. M. King, D. Herring, “Research Satellite for behaviors, i.e., the verification aspect, and Atmospheric Sciences, 1978-present,” in guaranteeing that a reaction occurs in response to a Encyclopedia of Atmospheric Sciences, Academic data or instrument stimulus, i.e., the steering Press, 2002. aspect. As a result, combined property 2. A. Morales. (Feb 20,2009). “Arctic Sea Ice specifications must be decomposed into data Underestimated for Weeks Due to Faulty Sensor,” verification properties and data steering properties. http://www.bloomberg.com/apps/news?pid=2060111 0&sid=aIe9swvOqwIY 3. Watts, A., “Is the U.S. Surface Temperature Record 4. Summary Reliable?” Chicago, IL: The Heartland Institute, 2009. The amount of sensor technology that collects 4. Dwyer, M. B., Avrunin, G. S., and Corbett, J. C., “A environmental data at research sites is rapidly System of Specification Patterns,” In Proceedings of increasing. Scientists need to be assured that the the 2nd Workshop on Formal Methods in Software collected data sets are correct. The equipment, however, Practice, Sept. 1998. typically does not include mechanisms to check quality 5. Konrad, S., Cheng, B.H.C., "Facilitating the of acquired data in real time. The need for correct data construction of specification pattern-based has pushed for the development of mechanisms and properties," In Proceedings of the 13th IEEE procedures to verify the integrity of the data. To support International Conference on Requirements scientists who need to specify properties for near real- Engineering, pp. 329-338, 29 Aug.-2 Sept. 2005 time monitoring of data streams, an approach based on 6. Canada Federal Department of Fisheries and Oceans. software-engineering techniques was developed. The Data Quality Assurance (QC) at the Marine goal of the research is to improve sensor-data error Environmental Data Service (MEDS). “DFO ISDM detection through scientist-specified properties that Quality Control”. http://www.meds-sdmm.dfo- capture temporal and data relationships regarding mpo.gc.ca/meds/Prog_Int/ WOCE/WOCE_UOT/ experimental conditions and instrumentation used in qcproces_e.htm. January 21, 2009 environmental science studies. 7. Gallegos I., Gates A.Q., Tweedie C., “Toward The work presented in this paper shows how a Improving Environmental Sensor Data Quality: A Preliminary Property Categorization”. In special issue specification pattern system approach to specification of IEEE Internet Computing, Information Quality in can be developed and applied to improve error- the Internet Era. Under Review detection in data. A data property categorization was defined based on the findings and analysis from a

68 An Assistive Technology Tool for Text Entry Based on N-gram Statistical Language Modeling

Anas Salah Eddin and Malek Adjouadi Center for Advance Technology and Education Department of Electrical & Computer Engineering Florida International University, Miami FL 33174, USA Emails: [email protected], [email protected]

Abstract and then dividing it by the total number of words The focus of this experimental study is placed on (tokens). However, this probability, also called suggesting a human-computer interface (HCI) for unigram, does not produce a real efficient prediction seamless text entry based on an Eye Gaze Tracking because of its dependency on the corpus, for instance, system. This interface is designed to serve as an assuming that a unigram assigned the word “world” a assistive technology tool to help persons with severe higher probability than “book”, thereafter this unigram motor disability enter text. An enhanced on-screen will fail to suggest the word “book” after “He wrote keyboard is being used with the help of a trigram the”. Thus, looking back at the words that come before language model as a prediction routine to increase the the suggested word will increase the suggestion routine efficiency of text entry. Suggesting highly probably credibility. Calculating the probability of a word while words based on the context of previously entered text taking in consideration the words before it is slightly attained the desired improvement. The performance of more complex than the unigram suggested above, for the word prediction routine was proven to be superior example when looking one word backwards (bigram) in clicking efficiency of text entry over stand-alone on- the probability could be calculated as follows (Jurafsky screen keyboards. Comparative results using volunteer & Martin, 2000): subjects ascertained this statement.

1. Introduction C(w n −1 w ) P(w | w n −1 ) = n −N +1 n (1) N-gram statistical language modeling is very commonly n n −N +1 C(w n −1 ) used in the major natural language processing n −N +1 applications (Jurafsky & Martin, 2000), such as speech recognition and machine translation; yet, is not widely Where: Stands for the n word and C stands for the reported with Eye Gaze Tracking (EGT) systems. EGT- count. based systems allow users to interact with the computer Looking backwards more than one word will evidently using only their eye gaze. These systems, however, increase the predictability of the suggestion routine, have some limitations; one of which is the lack of an hence stems the importance of N-gram where the efficient text entry method. Therefore, this paper general case probability is (Jurafsky & Martin, 2000): suggests the use of N-gram language modeling to make When facing a word that did not occur in the corpus, text entry via EGT systems less frustrating and more the last equations would assign it a probability of zero efficient. In addition, the “Word Completion” software that will ambiguously combine it with spelling program, integrated in this study, can be further mistakes. For example, if the word “prescribed” was exploited by other HCI modalities, such as touch screen never faced in the corpus it will be assigned a zero PDAs and other typical keyboards. probability; similarly the spelling mistake “iefhgilgh” will be assigned the same probability. This problem can 1.1 N-gram Language Modeling be partially solved by smoothing the language model, Text entry enhancement, which is proposed in this there are several smoothing algorithms the one that is paper, is based on guessing the next highly probably used throughout this paper is the Good-Turing Discount word, which will eventually increase the speed and algorithm; briefly, this algorithm reduces the sum of efficiency of text entry. One way to predict the next probabilities from 1 and split the discounted value word computationally is by calculating its probability in equally among all non-faced words. Returning to the a training text (corpus). Calculating the probability of a last example, after smoothing the language model both word depends on counting its occurrences in the corpus

69 words “prescribed” and “iefhgilgh” will be assigned, depending on the eye pupil and glint (Guestrin & for example, 0.0002 instead of zero. This is very useful Eizenman, 2006). The Output data of the image when calculating the probability of a sentence where processing board are then sent to a calibration board, several probabilities are multiplied and a zero answer which is used to frame the field of view and set its will be avoided. limits. The data is made available on the operator Since most calculated probabilities are small it is computer’s serial port and then sent to the stimulus recommended to use the logarithmic scale to avoid computer. computational downflow. 1.2 Backoff The discounting idea discussed above can be exploited intelligently by using it in combination with the Backoff algorithm. For example, assuming a trigram, which is generated with a Good-Turing discount, and a trigram combination that was not faced in the corpus, the Backoff algorithm suggest looking at the bigram for the same complex and if not found looking back at the Figure 1: The Components of the Eye Gaze Tracking System unigram. The Backoff can be generalized to an N-gram. For instance, if “This computer damaged” was not found in the trigram when trying to predict a possible Next, the stimulus computer gets the data and pass it to word, the Backoff algorithm suggest looking for a software program that maps the EGT coordinates to “computer damaged” in the bigram, if still not found, the screen coordinates giving the operator the chance to the algorithm suggest looking for “damaged” in the choose the mapping algorithm, the algorithm used in unigram. this paper is the conventional linear mapping. Then, the collected data is passed through a simplified jitter ⎧ P(wi | wi−2wi−1) if C(wi−2wi−1wi ) > 0 reduction module that averages five coordinate points, ˆ ⎪ P (wi | wi−2wi−1) = ⎨ α1P(wi | wi−1) if C(wi−2wi−1wi ) = 0 and C(wi−1wi ) > 0 (2) to update the mouse cursor position according to the ⎪ ⎩ α 2P(wi ) Otherwise output of this module. Additionally, the head movement effect on the remote camera is not compromised but is instead addressed by a typical headrest/restrictor. Both α1 and α2 in equation (2) are introduced to comply Figure 1 depicts the utilized system for our study. with the last note of reducing the probability of the Backoff step so that the total sum of probabilities does 2.2 Enhanced On-Screen Keyboard not exceed one. Most modern operating systems are equipped with a 2. Methods virtual on-screen keyboard (OSK) that helps increase The HCI modality used in this research is specifically computer accessibility. This research will adapt a designed for people with severe motor disability to similar solution by employing an enhanced on-screen harness the power of computing. Persons who are keyboard (EOSK) shown in Figure 2 specifically unable to grasp and use a computer mouse or are unable designed to work with the EGT system explained to press on keyboard buttons, as means to enter text, above. The EOSK used in this research consists of 99 will view such assistive technology tools as imperative buttons arranged in the most commonly used keyboard ways to communicate with their surroundings. layout, employed by North American English PCs (QWERTY). The EOSK helps users type in both letters The modality consists of three components: and commands. Furthermore, the EOSK is designed to • Eye Gaze Tracking System be always placed on top of all other running programs • Enhanced on-screen keyboard and windows without taking the focus itself, which • Word Completion software renders it perfect to be used directly with any other running program, from “”, for Each of which is thoroughly explained below. example, to “Notepad”. Several other reported solutions to enhancing text entry with EGT does not offer the 2.1 Eye Gaze Tracking System same seamless profile, but instead restrict the user to The EGT system used in this research consists of a the hassle of copying and pasting the text between the remote infrared camera and an infrared light source; solution software program and the target program like both are used to capture a series of images of the user’s Dasher (Ward & MacKay, 2002) illustrated in Figure 3. eye. The images captured are sent to an image Some other solutions constrain the user to typing inside processing board that detects the EGT field coordinates the solution software like (Mehta, 2007).

70 work in full compatibility with EOSK; therefore, the WCS also show on top of all other windows. Hence, it can, as the EOSK, be directly used with any other open application. The WCS is intuitively easy, and does not need extensive training. It consists of a list of highly probable words generated by the last two entered words; those words are shown on top of the generated Figure 2: Enhanced on-screen keyboard list. The WCS is also equipped with a user-friendly scrolling mechanism that is comprised of two The EGT system explained above employs one remote rectangular areas above and below the words list, as camera and one infrared light source that makes shown in Figure 4. The user can scroll up the words list controlling the mouse cursor prone to artifacts from by gazing at the top area, whereas scrolling down the head movement and eye saccade. Therefore, and since words list can be done by gazing on the bottom area. it is ergonomically not feasible to make all EOSK buttons large, a zooming option is being used as suggested by (Sesin et al., 2008), where a button is enlarged when the mouse cursor hover over it, thus, costing the user less effort in focusing on the button with all the cursor jittering, the button is then returned to its original size when the mouse cursor leaves its borders. This zooming option gives the user more flexibility without consuming the available workspace.

Figure 4: The word completion software

Additionally, both areas show the word in focus inside them, which help speed up the scrolling process by giving the user a quick feedback without the need to look back and forth between the word list and the scrolling areas. Moreover, the user can enter the word in focus by looking at the top of the WCS where the past two words are displayed. Going back to the “world/book” example in 1.1, the WCS will generate the following list after “He wrote the”: Figure 3: The Dasher screen. Courtesy of Table 1: A Word List generated by the WCS after “He (Ward & MacKay, 2002) wrote the.” 1. book 2. first Finally, to press a button on the EOSK a clicking timer 3. same 4. other has been set to help lower the eye lid muscular effort, 5. most 6. new where the user has to stare at the enlarged button for a 7. united 8. world selectable period of time. Additionally, an audio/visual 9. state 10. two feedback is used to acknowledge pressing the button. The last table clearly supports the assumption that 2.3 Word Completion Software language model prediction is superior to typical dictionary prediction, where a word list arranged Suggesting highly probable words will increase the lexicographically generally would not be sufficient to typing efficiency. As illustrated in 1.1 a trigram increase text entry efficiency dramatically. language model with a Backoff Good-Turing discount smoothing algorithm, which is trained using the Brown 3. Results corpus, is used as the backbone of the Word An experiment was conducted to assess the text entry Completion Software (WCS), this language model was improvement attained by: (a) the EOSK, (b) the EOSK built using SRILM (Stolcke, A. 2002) following the augmented with the word prediction routine provided recommendations in (Foster, G. et al., 2002) and by the WCS (EOSK+WCS), and (c) the Dasher (Goodman, J. et al 2001). The WCS was designed to method. This evaluation is achieved by a statistical

71 ANOVA F-test performed on both the calculated text 4. Conclusion entry speed and button clicking efficiency. Eye gaze tracking system is a very promising assistive The experiment consisted of three stages, during both technology that can help persons with severe motor the EOSK and the EOSK+WCS stages the subjects disabilities. The intrinsic cursor jitter problem, were asked to replicate a lengthy sentence, which however, makes entering text using EGT a very consisted of thirteen words or more, shown on top of frustrating unfavorable experience. To overcome this the screen to a text box shown right below it; whereas, problem, we introduced in this paper a seamless text during the Dasher stage the subjects were asked to pick entry mechanism that consists of an on-screen keyboard up a lengthy sentence that, similarly, consisted of at and a word suggestion software program based on a least thirteen words and then enter it. The subjects were trigram language model trained on the Brown corpus given a couple of minutes resting period between each text collection. The suggestion routine did not show any stage, and the EGT was recalibrated prior to each stage improvement in the entry speed; yet, it made the entry to maintain a reasonable control. All the collected data procedure a much appreciable process. An approach was obtained using a screen resolution of 800x600. that integrates the language model learning process in Six volunteers were recruited to conduct the ANOVA the WCS can be used in future work to make the F-test with a power of 98%. During each stage the text suggestion mechanism more intelligent, increasing text entry speed in words per minute (wpm) was collected. entry efficiency even further. In addition, during the EOSK and EOSK+WCS Acknowledgment experiments the number of issued clicks on the on- The authors appreciate the support provided by the NSF screen keyboard was noted alongside the total number grants CNS-0540592, HRD-0833093, CNS- 0837556, of characters in the test sentence. The subjects did not and CNS-0426125. have any prior experience with any of the solutions; therefore, they were given a few minutes trial period at References the beginning of each stage to get familiar with it. [Foster, G., Langlais, P., & Lapaime, G. (2002)]. User- Friendly Text Prediction for Translators. ACL-02 The text entry speed for the three studied solutions was conference on emperical methods in natural proven to be statistically congruent and was language processing. 10, pp. 148-155. Philadelphia. approximated to be 3 words per minutes. On the other [Goodman, J., Venolia, G., Steury, K., & Parker, C. hand, using the word prediction routine has (2001)]. Language Modeling for Soft Keyboards. dramatically increased the clicking efficiency from 67% Microsoft Corporation, Microsoft Research. (EOSK alone) to 170% (EOSK+WCS). Consequently, Redmond: Microsoft. this presumes that the word completion software should [Guestrin, E. D., & Eizenman, M. (2006)]. General have produced a faster text entry speed. However, the Theory of Remote Gaze Estimation Using the Pupil cognitive overhead that accompanies the WCS is Center and Corneal Reflections. IEEE Transactions probably the reason why it does not produce a higher on Biomedical Engineering. 53, pp. 1124-1133. speed. This cognitive overhead can be explained by the Toronto: IEEE. fact that the user has to go back and forth between the [Jurafsky, D., & Martin, J. H. (2000)]. Speech and predicted words list on the WCS and the EOSK in Language Processing - An Introduction to Natural search for the desired word. Additionally, the time Language Processing, Computational Linguistics, needed to scroll through the words list on the WCS and Speech Recognition (1st Edition ed.). (M. might have appended some delay to the entry process. Horton, Ed.) ,New Jersey, USA: Prentice-Hall, Inc. Such an outcome should thus be weighed against the [Mehta, A. (2007)]. When a Button Is All That added frustration of using the EOSK alone, requiring Connects You to the World. In A. Oram, & G. that each word is written as one letter at a time. Wilson, Beautiful Code (pp. 483-501). Sebastopol, Most of the subjects have expressed a positive opinion CA, USA: O’Reilly Media, Inc. on the merits of the predicted words list and expressed [Sesin, A. et al., (2008)]., “An Adaptive Eye Gaze that this is a good solution provided that they were Tracking Using Neural Network- Based User unable to use the mouse. Dasher addresses the cognitive Profiles to Assist People with Motor Disability”, overhead problem faced with the predicted words list Journal of Rehabilitation Research and with a unique approach; however, most of the subjects Development, Vol. 45 (6), pp. 801-817, 2008. did not feel comfortable using it and reflected that it is [Stolcke, A. (2002)]. SRILM - An Extensible Language too messy or significantly unintuitive compared to any Modeling Toolkit. Intl. Conf. Spoken Language on-screen keyboard solution. Processing. Denver. [Ward, D. J., & MacKay, D. J. (2002)]. Fast hands-free writing by gaze direction. Nature , 418, 838.

72 Detecting the Human Face Vasculature Using Thermal Infrared Imaging

Ana M. Guzman, Malek Adjouadi, Mohammed Goryawala

*Center for Advanced Technology and Education Department of Electrical & Computer Engineering Florida International University, Miami FL 33174, USA Emails: [email protected], [email protected], [email protected]

Abstract emotions, as well as temperature increase on the ear and cheek after using a cellular phone [8]. The study This paper presents preliminary findings using thermal presented in [7,1] is the only study up to date that infrared imaging for the detection of the human face provides an algorithm for the extraction of the human vasculature network at the skin surface. A thermal face vasculature network in the thermal infrared infrared camera with reasonable sensitivity provides the spectrum. The purpose of our experiment is to replicate ability to image superficial blood vessels on the human the study in [7] and to create our own algorithms for skin. The experiment presented here consists of the future research on the creation of a robust biometric image processing techniques used in thermal infrared system using the human face vasculature for the images captured using a mid-wave infrared camera purpose of human identification or recognition. from FLIR systems. For the purpose of this experiment thermal images were obtained from 10 volunteers, they were asked to sit straight in front of the thermal infrared camera and a snapshot was taken of their frontal view. The thermal infrared images were then analyzed using digital image processing techniques to enhance and detect the facial vasculature network of the volunteers.

1. Introduction

Skin forms the largest organ of the human body, skin accounts for about 16 per cent of a person’s weight. It performs many vital roles as both a barrier and a regulating influence between the outside world and the controlled environment within our bodies. Internal body temperature is controlled through several processes, including the combined actions of sweat production and the rate of blood flowing through the network of blood vessels within the skin. Skin temperature can be measured and visualized using a thermal infrared Fig.1 Superficial arteries and veins in the human face. camera with a reasonable sensitivity. Facial skin Courtesy of Primal Pictures [4]. temperature is closely related to the underlying blood vessels thus by obtaining a thermal map of the human 2. Methods face we can also extract the pattern of the blood vessels just below the skin. 2.1 Participants Thermal Infrared Imaging has many applications in different scientific, engineering, research, and medical For the purpose of this study we collected thermal areas. Different studies using thermal infrared imaging infrared images from 10 different subjects. Each subject have been done to detect spontaneous emotional facial was asked to sit straight in front of the thermal infrared expressions [10], skin tumors [3], to recognize faces camera and a snapshot of their frontal view was taken. and facial expressions [7], frustration [5] and other

73 This process was repeated at least two more times in image morphology and blood vessel segmentation post- different days and at different time of the day. processing.

2.2 Equipment and Software

The primary equipment used for this study is a thermal infrared camera (Merlin™ InSb MWIR Camera, FLIR Systems) and a Microsot Windows® based PC. The thermal infrared camera communicates with the PC through a standard Ethernet and iPort grabber connection. The camera consists of a Stirling-cooled Indium Antimode (InSb) Focal Array Plane (FPA) built on an Indigo Systems ISC9705 Readout Integrated Circuit (ROIC) using indium bump technology. The FPA is a 320 × 256 matrix of detectors that are sensitive in the 1.0µm to 5.4µm range. The standard camera configuration incorporates a cold filter that Fig.2 Thermal infrared image of a volunteer. restricts the camera’s spectral response to the 3.0 µm – 5.0 µm band. The camera has a 25mm lens with a field of view of 22 × 16°. 2.4.1 Face Segmentation

The thermal sensitivity is 0.025 °C at 30 °C ambient In this step the face of the subject was segmented from temperature. The absolute temperature measured the rest of the image. The segmentation process was depends on different factors such as emissivity of the achieved by implementing the technique of localizing object, ambient temperature, and humidity. Relevant region based active contours in which typical region- parameters can be changed in the software based active contour energies are localized in order to (ThermaCAM™ Researcher V2.8 SR-1) provided by handle images with non-homogeneous foregrounds and FLIR Systems. The temperature accuracy is ± 2 °C and backgrounds [9]. ± 2% of reading if all the variables (emissivity, temperature and humidity) are correctly set. In the ThermaCAM™ software the following values were used for the video recording: • Emissivity: 0.98; • Distance: 1.2 m • Relative humidity: 50%; • Temperature: 23 °C.

The thermal infrared images obtained are then processed using MATLAB and FSL software for image registration.

2.3 Design of the Experiment Fig. 3 Face segmentation. The recording of the thermal infrared images was done in a room with a mean room temperature of 23 °C. The infrared camera was placed on a tripod. The subjects 2.4.2 Noise Removal were asked to sit straight on a stationary chair with a headrest to avoid any head movement. The chair was After the face was segmented from the rest of the placed 1m in front of the thermal infrared camera. The thermal infrared image we proceeded to remove subjects were asked to look straight to the lens and a unwanted noise in order to enhance the image for snapshot of their frontal view was taken. further processing.

2.4 Image Processing Noise removal from the image was achieved by using an anisotropic diffusion filter [6]. This processing step The feature extraction process consists of four main smoothes regions while preserving, and enhancing the steps: face segmentation, removal of unwanted noise, contrast at sharp intensity gradients.

74 compare those features that remain the same in the vasculature network when the images are taken at different times.

Fig.4 Facial vasculature without noise removal.

2.4.3 Image Morphology Fig.5 Final facial vasculature obtained after executing the Morphological operators are based in set theory, the four steps of the feature extraction process. Minkowsky operators and DeMorgan’s law. Image morphology is a way of analyzing images based on 3. Results shapes, in our case we assume that the blood vessels are a tubule like structure running along the length of the The process outline in section 2.4 of this paper is done face. The operators used in this experiment are opening to every thermal image obtained from different and top-hat segmentation. subjects. Figures 6, 7, and 8 show the result of doing image registration to the final facial vasculature of three The basic effect of an opening operation is to remove volunteers. some of the foreground (bright) pixels from the edges of regions of foreground pixels. The effect of the operator is to preserve foreground regions that have a similar shape to the structuring element, or that can completely contain the structuring element, while eliminating all other regions of foreground pixels.

The top-hat segmentation has two versions, for our purpose we use the version known as white top-hat segmentation as this process enhances the bright objects in the image, this operation is defined as the difference between the input image and its opening. The result of this step is to enhance the maxima in the image.

2.4.4 Blood Vessel Segmentation Post-Processing

After obtaining the maxima of the image we skeletonize the maxima. Skeletonization is a process for reducing foreground regions in an image to a skeletal remnant that largely preserves the extent and connectivity of the original region while throwing away most of the original foreground pixels. Fig.6 Image registration of final vasculature, including the neck, using four images. After having extracted the facial vasculature features the FSL software is used to do image registration as to

75 different times for a same individual which proves the critical aspect of repeatability in the results. The research will continue to develop new algorithms to remove fake vasculature contours, generate a matching score that will quantify the similarity between the input and the database template representation. The implementation of a biometric system will also be part of our future work and the creation of a larger database for the testing of the implemented feature matching algorithms. Acknowledgements The authors appreciate the support provided by the National Science Foundation grants CNS-0540592 CNS- 0837556, HRD-0833093, and CNS-0426125.

References 1. Buddharau, P., Pavlidis, I.T., Panagiotis, T., Fig. 7 Image registration of final vasculature, including Bazakos, M., “Physiology-Based Face Recognition the neck area, using two images. in the Thermal Infrared Spectrum”, IEEE

Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 4, pp. 613-626, April 2007. 2. Diakides, N. & Bronzino, J., Medical Infrared Imaging (CRC Press, Taylor and Francis Group, FL, 2008). 3. Mital, M. & Scott, E., “Thermal Detection of Embedded Tumors Using Infrared Imaging,” Journal of Biomechanical Engineering, Vol. 129, February 2007, 33-39 4. Moxham, B.J., Kirsh, C., Berkovitz, B., Alusi, G., and Cheeseman, T., Interactive Head and Neck (CD-ROM). Primal Pictures, Dec. 2002. 5. Pavlidis, I., Dowdall, J., et. al., “Interacting with Human Physiology,” Computer Vision and Image Understanding, Vol. 108 (2007), 150-170. 6. Perona, P., Malik, J., “Scale-Space and edge Detection Using Anisotropic Diffusion”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 12 (7), pp. 629-639 , July 1990. 7. Socolinsky, D., Selinger, A., Neuheisel, J., “Face Recognition with Visible and Thermal Infrared Imagery,” Computer Vision and Image Fig. 8 Image registration of final vasculature, including Understanding, 91(2003), 72-114. the neck, using two images. 8. Straume, A., Oftedal,, G., Johnson, A., “Skin 4. Conclusions Temperature Increase Caused by a Mobile Phone: A Methodological Infrared Camera Study,”

Bioelectromagnetics 26(2005), 510-510. The presented results show that thermal infrared images 9. Lankton, S., Tannenbaum, A., “Localizing Region- allow us to extract the facial vasculature through Based Active Contours”, IEEE Trans. on Image different image processing techniques. The image Processing, Vol. 17(11), pp. 2029-2039, 2008. registration results also show that the facial vasculature 10. Zeng, Z., “Spontaneous Emotional Facial among individuals is unique and that there is little Expression,” Detection, Journal of Multimedia, change on its structure when the images are taken at Vol.1, No. 5, August 2006, 1-8.

76 An Incremental Approach to Performance Prediction

Javier Delgado and Malek Adjouadi Center for Advanced Technology and Education Department of Electrical & Computer Engineering Florida International University, Miami, FL 33174, USA Emails: [email protected], [email protected]

Abstract • Works on modern, commercial processors (e.g. multicore). Use of computing systems has traditionally been • Works on systems with a large deadline driven. The resurgence of lease-based access number of nodes. to computing systems is particularly deadline driven. • Tested on several applications. To ensure users’ quality of service requirements are • A set of tools that can be deployed on these met and computing center resource usage is maximized, kinds of systems to provide the performance performance prediction is essential. The rapid predictions. development and increasing complexity of today’s computing systems has made performance prediction The overall scenario being addressed is shown in challenging. In this work, we describe our approach to Figure 1. Users provide the system with an application performance prediction, which combines human and a time constraint, the system determines if it is knowledge with application modeling techniques to possible to satisfy the constraint. accurately predict the performance of a weather simulation application without being overly resource intensive.

1. Introduction

Many current computing applications require knowledge about the execution time of an application. For resources that are time shared among many users, such as in high performance computing (HPC) centers and cloud computing providers’ data centers, Figure 1. A performance prediction methodology is necessary for performance prediction is beneficial for allocating determining which direction to go to at the two decision points in this resources based on what is known about the job a user scheduling flowchart. needs to run. For real-time systems, schedulers need to know what the execution time of tasks are in order to 2. Background decide if a task set is feasible. Due to the rapid development and increasing 2.1 Overview complexity of computing systems, performance prediction even on single processor systems is still an There are two general ways to model application ongoing research problem. Several groups are looking execution, static and measurement-based. Static into this problem. Some are using general approaches analysis requires in-depth information about the [1][2] and others are using more application-specific architecture being modeled. The complexities and rapid approaches, for example for HPC [3]. development time of processors make it difficult to The goal of this work is to develop a performance achieve good accuracy using static analysis. For prediction paradigm that can be used with different example, in [4] the authors explain how difficult it was applications. Towards this end, two main contributions to obtain good accuracy on a simulator for a processor are provided. that they designed themselves (and had an actual • An execution time prediction methodology hardware implementation of). Measurement-based that provides high accuracy and low analysis techniques use historical data about application computational overhead, with the following execution to estimate future execution times. Pure characteristics measurement-based approaches that do not take into account the execution platform and/or application input

77 are overly naïve. The proposed approach is a hybrid reported by the operating system during application approach that uses measurements as its basis, but also execution. For more specific, low-level measurements, relies on human knowledge to help guide the the Perfsuite performance monitoring framework is prediction. used. The two tools developed are described in detail in 2.2 Related Work [6]. The following short summary describes how they are used. There are several groups working on performance • amon – A MONitoring tool which collects prediction. For this work, we focus on HPC system and process-related data while an applications, so in this section we give an overview of application is running. performance prediction approaches for HPC • aprof – A PROFiling tool that builds an applications that use methodologies similar to ours. One application profile out of the amon data and such approach is described in [3]. The authors use implements a prediction model. regression analysis to predict the scalability of The current prediction model is based on the applications running on a large cluster. Our work assumption that execution time can be modeled as a differs from theirs in that we use external monitoring function of the parameters that contribute to it (i.e. the tools and we perform cross-cluster predictions in parameters of the application profile). We use a simple addition to scalability prediction. The authors of [5] formula that models the execution time as the product describe a user-assisted prediction approach that of resource consumption parameters. clusters applications based on a set of parameters that the users believe are indicative of execution time, and use a statistical function to estimate the execution time Mathematical of a given application after it has been assigned to one Modeling (B) of the clusters. Our approach differs from this one in that there is no clustering involved. Instead, the chosen parameters are measured and used to form an equation Application Application for execution time prediction. Analysis (A) Profiling (C)

3. Approach Parameter The methodology used is a four-step iterative Estimation (D) process. Figure 2 provides a graphical depiction of the process. The four steps are iterated until acceptable accuracy is achieved. We consider less than 10% Figure 2. Overview of the performance prediction methodology. accuracy to be acceptable. Equation (1) shows the prediction formula. In the The first step of the modeling approach is the formula, the a term is the contribution coefficient for Application Analysis stage. The process for this stage is ij the j-th polynomial of the i-th consumption parameter as follows: (z ). 1. Study the parameters of the execution domain i and software. (1) 2. Form an application profile for each application being modeled, which is a vector of parameters that contribute significantly to The model is evaluated by comparing the measured the application’s execution time. Examples of execution times to the predicted execution time, using profile parameters include CPU clock speed, Equation (2). number-of-cores, number-of-inputs, number- of-FLOPS, and application input. (2) Once the application profile parameters have been 4. Results identified, they need to be measured during application execution to test whether the hypothesis about their 4.1 Overview of Experiments significance was correct. In order to do so, some kind of instrumentation tool needs to be used. Several instrumentation tools exist, but most of them must be The application selected for the first set of linked to applications and/or explicitly called during validations was the weather research and forecasting application execution. We developed non-intrusive (WRF) application. WRF is a good representative of tools that gather system-wide parameters that are HPC applications. WRF also has characteristics of a

78 real-time system: if a weather simulation is not provided with adequate time left to prepare for a coming weather-related emergency, it is useless. The accuracy of the model with similar systems and a relatively low number of nodes was well established [6]. The problem being addressed now is prediction with larger, multicore systems. Multi-node executions on multi-core systems are harder to model Figure 3. Actual versus predicted execution times for two WRF due to the large difference in execution behavior when simulations run on Abe. executing on separate cores in different systems as opposed to separate cores on the same system. Sometimes, having the cores on the same system is an 4. Conclusions advantage since memory that needs to be accessed by more than one core is accessible directly. However, less We have described the use of a methodology for memory is available for each core and several predicting application execution time using a components must be shared by the cores, thus creating a mathematical approach. High accuracy is obtained negative impact on performance. when performing predictions across similar systems and/or application inputs. The accuracy decreases as As a result, it was necessary to determine which then number of variables increases. As a result, we are parameters to use in the application profile in order to currently looking at ways to refine the model in order achieve high accuracy. The resulting parameters were for it to deal with this shortcoming. number-of-nodes, cores-per-node, read/write memory bandwidth, network bandwidth, and a platform Acknowledgements contribution. The latter term is used for cross-platform This work was supported in part by the National predictions. It is a measure of how powerful the Science Foundation (grants OISE-0730065, OCI- processor is for certain applications. 0636031, HRD-0833093, IIS-0552555, HRD-0317692, In our previous work [6], processor clock speed CNS-0426125, CNS-0520811, CNS-0540592, and IIS- was an acceptable measure for the platform 0308155). Presentation of the poster associated with contribution. Due to the large difference among the this paper was supported in part by NSF Grant CNS- processor architectures of the systems modeled for this 0837556. work, a more sophisticated parameter had to be determined. We used the score of the NAS Parallel References Multizone benchmark [7] for this purpose. 1. Marc Casas, Rosa M. Badia, Jesus Labarta. Several systems were used to test our “Prediction of Behavior of MPI Applications.” In methodology. Up to 64 nodes and 4 cores per node Proceedings of IEEE Cluster 2008, 2008. were used in all systems. A summary of the systems 2. Hassan Rasheed, Ralf Gruber, and Vincent Keller, can be seen in Table I. “IANOS: An intelligent application oriented scheduling middleware for a HPC grid”. TABLE I. SYSTEMS USED TO TEST OUR METHODOLOGY CoreGRID. Tech. Rep. TR-0110, 2007. Host Cores/ Max CPU Interconnect 3. Bradley J. Barnes, Barry Rountree, David K. Name Node Nodes Lowenthal, Jaxk Reeves, Bronis de Supinski, and Xeon Nocona Mind 2 16 1GigE 3.6GHz Martin Schulz. “A regression-based approach to Xeon scalability prediction.” In Proceedings of the 22nd Abe Clovertown 8 64 10GigE annual international conference on 2.33GHz Supercomputing, 2008, pp. 368–377. Mare- Power 970MP 4 128 Myrinet nostrum 2.3GHz 4. Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (simulated) FLASH: closing the 2.2 Validation simulation loop, ACM SIGPLAN Notices, v.35 n.11, p.49-58, Nov. 2000. Several experiments were performed to test the 5. Warren Smith , Ian T. Foster, and Valerie E. model. Figure 3 shows the results obtained on Abe Taylor, “Predicting application run times using when using up to 64 nodes and up to 4 cores per node. historical information,” Proceedings of the The data inputs consisted of three sets of execution data for each of the runtime configurations used.

79 Workshop on Job Scheduling Strategies for Parallel Processing, 1998, pp.122-142 6. S. Masoud Sadjadi, Shu Shimizu, Javier Figueroa, Raju Rangaswami, Javier Delgado, Hector Duran, and Xabriel Collazo, “A modeling approach for estimating execution time of long-running scientific applications.” In Proc. 22nd IEEE International Parallel & Distributed Processing Symposium, the Fifth High-Performance Grid Computing Workshop, 2008. 7. R. van der Wijngaart and H. Jin, “NAS Parallel Benchmarks, Multi-Zone Versions,” NASA Ames Research Center, Moffett Field, CA. Tech. Rep. NAS-03-010,2003.

80 Using Genetic Algorithms on Scheduling Problems

Ricardo Macias Computer Science Department California State University, Dominguez Hills Email: [email protected]

Abstract duration is satisfied [1]. This problem has been studied by many researchers using different approaches. Genetic algorithms are optimal search methods that Vicente, et. al., used a hybrid genetic algorithm that emulate evolutionary processes. They perform uses a “peak crossover operator”, so that it “exploits the operations that are associated with reproduction in knowledge of the problem to identify and combine the biological organisms by scanning a solution space and good parts of a solution that contribute to the quality producing an ideal set of solutions. In computer [1].” The work performed by Mendes [2], et. al., uses a science, they are used to solve complex problems and random key based genetic algorithm so that a set of provide an optimal solution [3]. The genetic algorithm elements are generated using a random numbers is also used in the other problem domains outside of between 0 and 1; each set of elements are feasible computer science including economics, sociology, solutions [2]. Genetic algorithms have received chemistry, and computer networks [4]. In my research, profound attention. Researchers are implementing the goal is to implement Genetic algorithms as a genetic algorithms to various scheduling problems. The problem solving strategy for the Resource Constrained goal of this research is to further investigate scheduling Project Scheduling Problem. Projects often have the problems, specifically the Resource Constraint Project dilemma of limited resources and changing task Scheduling Problem, and extend the implementation requirements during a project life cycle. The idea is to with a new method. To date, this research is in the generate a project schedule where tasks will be beginning stages; it is an ongoing project that will completed at a given time and with allocated resources continue to investigate the problem. by considering factors that may affect the project completion. Although other techniques, including variations of the Genetic Algorithm, have been used, 2. Proposed Method the goal is to find and introduce a new method to incorporate into the Genetic Algorithm to solve this Our method is to reduce the complexity of the problem. Currently, the research is an ongoing project, problem, so that given the Resource Constraint Project and the problem will be further investigated. Scheduling Problem, we will focus on a specific element of the problem, particularly time constraints. So, our problem will be reduced to a time scheduling 1. Introduction problem. There are certain elements in the genetic algorithms to consider [5]. Our algorithm will evaluate The Resources Constraint Project Scheduling a set of candidate elements, or potential solutions, and Problem is defined as a set to n activities, where the perform a crossover operation such that a pair of first and last activities are “dummy” activities that solutions are selected and allowed to exchange represent the beginning and the end of a project, as information, so that a different solution is generated [6]. described by [1]. Each activity has an associated set of Another operator, called a mutation, will modify each elements: duration and renewable resources, so that solution to generate a different solution from the each project, except for the “dummy” variables, is previous. Solutions will be selected based on a specific assigned a specific number of resources and time slots criteria; a fitness function will determine which to be completed. Resources may change throughout the solutions will be selected for the crossover and life cycle of the project, with certain projects taking mutation process [7]. Genetic algorithms perform these precedence over others, and projects requiring a operations in such a way that it emulates the natural specific amount of resources. The objective of the functions in biological systems, developing an ideal resource constraint project scheduling problem is to solution to a complex problem. find a schedule of activities where precedence and resource constraints are satisfied, so that schedule

81 3. Ongoing Research problem, European Journal of Operational Research, vol. 185, 2008, pg 495-508. A random key based genetic To date, our research is in the initial stages of 2. Mendes, J. J., et. al., algorithm for the resource constrained project development. We expect to develop concrete results scheduling problem Computers and operations that will meet our goals. , research, vol. 36, 2009, pg. 92-100. 3. Goncalves, J. F. et. al., A genetic algorithm for the resource constrained multi-project 4. Acknowledgement scheduling problem, European Journal of Operational Research, vol. 189, 2008, pg. 1171- This paper is based on work supported by the 1190. National Science Foundation (NSF) through grant 4. Mitchell, Melanie, An introduction to genetic CNS- 0837556. Any opinions, findings, and algorithms, MIT Press, 1998, conclusions or recommendations expressed in the paper 5. Man, Kim F., Genetic Algorithms: concepts and are those of the authors and do not necessarily reflect designs, Springer, 1999 the views of the NSF. 6. Michalewics, Zbigniew, Genetic Algorithms and Data Structures = Evolutionary Programming, Springer, 1996, 7. Zalzala, Ali M. S. & Fleming, Peter J., Genetic References algorithms in engineering systems, The Institution of Electrical Engineers, 1997 1. Valls, Vicente, et. al., A hybrid genetic algorithm for the resource-constrained project scheduling

82 Semantic Support for Weather Sensor Data

Jessica Romo and Cesar Chacon Faculty Mentor: Rodrigo Romero Cyber-ShARE Center of Excellence University of Texas at El Paso El Paso TX 79968, USA Emails: {jromo2, crchacon}@miners.utep.edu, [email protected]

1. Abstract problem is how to support searches that capture all Web-posted information from weather sensors items of interest and only those items. does not lend itself to automated searches and To enable better automated, more informed better informed queries, which would retrieve all searches of weather sensor data; this paper relevant information for a search and avoid proposes using Semantic Web technologies and providing irrelevant results. This paper focuses on methods. More specifically, this paper presents the the design and application of an ontology to design of an ontology, the Weather Sensor Data represent and semantically annotate weather (WSD) ontology, which describes produced data, sensor data for Web-posting. Using the ontology data relations, and data provenance for enabling presented here to organize and annotate sensor semantic annotation of web-posted data. This data will assist users and Web crawlers when paper describes the principal steps of the design of retrieving weather information. This paper also the WSD ontology and discusses a general presents the principal processing steps from data procedure to use the ontology. generation at sensors through posting at a semantically-annotated Web site. 3. Ontology Design An ontology is a specification of a 2. Introduction conceptualization of some knowledge domain Measurement of environmental parameters which, while not being necessarily comprehensive, such as temperature, pressure, and other weather- must be formal to support automated searches and related information generates vast amounts of inferences [1]. OWL and RDF are ontology sensor data. The John W. Kidd Memorial languages that allow the specification of classes, Seismological Observatory at the University of properties, and relationship between classes which Texas at El Paso (UTEP) contains instrumentation are necessary to represent knowledge. Classes are that measures and records temperature, humidity, concepts that may have subclasses to represent and pressure data. Some of the measured data is more specific concepts and super-classes to currently available in HTML format at a express more general concepts. Properties describe departmental website. However, posted features of the concepts. A set of instances of information does not lend itself to automated classes represents a knowledge base. searches and better informed queries based on data The first step to design an ontology is to provenance and other information such as determine the problem to be solved using the correlation of sensed data and weather events in ontology. The main purpose of the WSD ontology the region where the sensors are installed. is to organize and annotate sensor data to help Search engines look for items based on words users and crawlers to retrieve relevant weather associated to displayable or hidden html tags in information. crawled websites. Because of this, search engines The second design step is to identify classes tend to return many hits with information that is and their properties. Temperature, station irrelevant for a sought topic. Engines also fail to barometric pressure, altitude, sound, and time retrieve all elements related to search topics. The events are the main WSD ontology classes. Current temperature, dew point, wet-bulb and heat

83 index are sub-classes of temperature. Station 4. Applying the WSD Ontology pressure, altimeter setting, vapor pressure and The process followed to semantically annotate saturated vapor pressure are sub-classes of and post weather sensor data using the WSD pressure. Pressure and density are sub-classes of ontology is illustrated in Figure 1. altitude. Notice, however, that station pressure and altitude pressure are distinct concepts. Each of these classes has measurement type as a property. The knowledge base is composed of instances of the classes which contain the value measured and its measurement type. Instances of the temperature class have as measurement type Fahrenheit, Celsius, Kelvin, or Rankine. Instances of the pressure class have millibars, inches of mercury, millimeters of mercury, pascals or pounds per The process has the following phases: (1) sensing square inch as a measurement type. The instances and recording data, (2) storing the data in a of the altitude class have feet or meters as the database with a schema which includes entities measurement type. and attributes based on the WSD ontology, (3) The third design step is to consider reuse of perform calculations using stored data for display existing ontologies since building an ontology purposes, (4) semantically annotate the results of from scratch, which would be the fourth design the calculations and other retrieved information to step, is time-consuming, requires significant be posted to a website through the ontology, and collaboration efforts with domain experts, and (5) display web pages with semantic annotations. could require use of mediating ontologies for Following the process described above, interoperability with related ontologies within the temperature in Celsius degrees, pressure in same domain or in related domains. A search for millibars, and the percentage of relative humidity an ontology that captures weather information and are collected from the Kidd Observatory at UTEP at least partially includes the representation of during phase 1. The sensor hardware is contained sensor data to be recorded and processed using the in a Suominet Vaisala meteorological package WSD ontology, a DAML ontology for generating which includes a barometer, a temperature probe, weather reports was encountered at the Weather and a humidity probe [4]. Data is recorded every Agent website [2]. The ontology was modified 15 minutes and stored in a MySQL database. with the addition of classes and properties needed Calculations are implemented by a PHP program for the WSD ontology to represent weather data that calculates dew point, wet-bulb, heat index, recorded by the Kidd Seismological Observatory altimeter setting, vapor pressure, and altitude at UTEP. based on the data recorded by the sensors. Protégé [3], an OWL ontology editor, was In phase 4, the WSD ontology is used to used to transform the Weather Agent ontology into annotate data to be posted to assist crawlers in the WSD ontology. Protégé provides a friendly finding precisely weather data required by the user interface to add, edit, and remove classes, user. In phase 5, a semantically-enabled Web properties, and instances of an ontology. content management system, such as Semantic Therefore, there is no need to know the OWL MediaWiki, can be used to exploit semantic language to create or modify an ontology; an features defined by the WSD ontology. Semantic ontology to be modified can be imported into MediaWiki provides a language to create Protégé for edition. Classes needed by the WSD categories, properties, and annotations which ontology included wet-bulb, station pressure, correspond to ontology classes, properties and vapor pressure, saturated vapor pressure, and instances, respectively [5]. Semantic MediaWiki altitude. WSD properties and instances of these generates an RDF file for every page in a site. This new classes were also added. file can be modified to apply the WSD ontology.

84

5. Conclusions This paper presents the design and an application process of an ontology for representing weather sensor data – the WSD ontology. The process described indicates the principal steps required for taking weather data from source sensors through a set of operations that produce semantically annotated data and information to support better automated, more informed searches of weather sensor data. Future work includes the application of the WSD ontology and the process outlined in this paper to publish weather sensor data in a Semantic MediaWiki website.

6. Acknowledgements The authors thank Mr. Leo Salayandia, a Research Specialist from the Cyber-ShARE Center of Excellence for advice on analysis of ontologies. This material is based upon work supported in part by the National Science Foundation Grants CREST HRD–0734825 and MRI–0923442. Presentation of this paper was supported in part by NSF Grant CNS–0837556.

7. References

1. T. Seragan, C. Evans, and J. Taylor, “Programming the Semantic Web”, O’Reilly Media, 1 edition, July 2009. 2. Weather Ontology, AgentCities@ Abardeen, University of Aberdeen, January 2002, http://www.csd.abdn.ac.uk/research/AgentCiti es/WeatherAgent/weather-ont.daml. 3. Protégé OWL Ontology Editor for the Semantic WEB, Stanford Medical Informatics, 2006, http://protege.stanford.edu/plugins/owl/ documentation.html. 4. SuomiNet GPS Receiver, Suominet Support, Cosmic, 2003, http://www.suominet.ucar.edu/ support/receiver.html. 5. Semantic MediaWiki, March 2008, http://semantic-mediawiki.org/wiki.

85 STg: Cyberinsfrastructure for Seismic Travel Time Tomography

Cesar Chacon, Francisco Licea, and Julio Olaya Faculty Mentor: Rodrigo Romero Cyber-ShARE Center of Excellence University of Texas at El Paso El Paso TX 79968, USA Email: {crchacon, flicea, jolaya}@miners.utep.edu, [email protected]

1. Introduction end is controlled by a customizable script. This Seismic travel time tomography is a technique for prevents recompilation of the front end for using determining the crustal velocity structure of a completely different computational back ends for geographic region. Seismic tomography is a comparison or enhancement purposes [5]. However, the computationally intense procedure based on two major inclusion of a back end different from the one discussed conceptual steps: forward modeling [2][3] and in this paper would require the replacement to take as inversion [1]. However, overall computation time is input the XML project definition and to comply with dominated by subtasks associated with inversion. This the front-end protocol for monitoring the processing paper presents a software system, named STg, which progress. has a distributed processing architecture design for an implementation of seismic travel time tomography. STg’s front end, which is partly shown in Figures 1 and 3, enhances usability of the tomography software included as the distributed computational back end. STg front-end graphical interface provides multiple validations of the experimental output data to be used by the tomographic algorithm and decouples processing definition from the associated back-end computations. STg’s computational back-end, which implements the tomographic algorithm, comprises several executables that implement forward modeling as a single step and the inversion subtasks as a set of independently implemented steps. Through each iteration of the tomographic algorithm, a new set of output models is produced that can be used as the input of the next iteration or as the final models produced by the algorithm after model convergence. STg’s enables users to monitor execution of computational steps and progress of model convergence. Since the inversion procedure is nonlinear, obtaining a converging model can be elusive. This leads to a solution based on trial and error, where the ability to distribute computations Figure 1. Front end wizard for project definition to a high performance platform and to monitor processing and convergence progress are key for 2.2 XML Project Definition reducing the time needed to obtain a final tomographic model of the studied region. The XML tomographic project definition, as shown in Figure 2, serves as metadata that describes the 2. STg’s Multi-tier Design project inputs, outputs, and computation-control parameters to the back-end. The XML definition does not contain any of the actual data from the experimental

2.1 Front End measurements, which allows users to define projects To provide implementation and execution flexibility reusing experimental data with different modeling to the front and the back ends, both are independently parameters [4]. portable to other platforms and invocation of the back-

86 The XML project definition contains the following execution of the tomographic algorithm, which is an information in the project setup section: implementation of the Vidale-Hole algorithm for Username: Establishes a one-to-many ownership seismic travel time tomography [1][2][3] released by relationship between users and projects. J.A. Hole and documented as a workflow . The back Project Name: Encapsulates and distinguishes the set end, distributed to tiers t2 through tn, is a collection of of parameters used by different processing runs or executables that perform the computational steps projects. required by seismic tomography. Project Path: Provides the location of the experimental The tiers communicate using an XML-based measurement data and the project definition. This path tomographic experiment definition. An inter-tier also defines the location where the project communication protocol enables the front end to obtain computational outputs will be stored. information about processing progress from the back Iterations: States the number of iterations that will end and retrieve partial results for analysis. run. The number of interations, n, is used for estimating The multi-tier design facilitates the use of execution workload on the back end and for creating parallelization and distributed computing to improve the output hierarchy which is returned in n folders. execution time. The back end computations can be The back end extracts information from the project made using high-performance or distributed computing definition about where to store the results of the platforms such as symmetric multi-processors, beowulf- computations. type clusters, and fully distributed environments. Server: The server or hosts, tiers t2 through tn, will act as the computational back end of the project. Date: A timestamp with the creation date of the project definition that is added automatically by the front-end when the user saves a project. Active: This attribute is a Boolean value that indicates whether the project definition is complete, and it contains all the validated parameters required by the back end. If Active is false, it implies that the project definition contains partial metadata which is valid, but some of the project information is missing. ModelType: Two different starting configurations can be used to execute the steps of the seismic tomography process. ModelType specifies whether the starting Figure 3. Front end process monitoring configuration is a one-dimensional model or a three- dimensional one. 3. Results

STg’s runs produce crustal velocity models and related outputs which have been validated against results from independent executions of the tomographic software that implements STg’s computational back and a published velocity model [6]. The current implementation of STg is being used as the computation platform to process a seismic travel time tomography experiment at the Department of Geological Sciences of the University of Texas at El Paso. This utilization of STg will guide development Figure 2. Project definition and enhancements for future releases of this software.

4. Conclusions 2.3 Computational Back End

This paper presents STg – a software system for The architecture can be based on as few as two tiers. distributed computation of seismic travel time However, the back end can be distributed to a collection tomography models. While the design of STg’s front of tiers, t through t . The front end, or tier t , is used to 2 n 1 end focuses on usability by simplifying specification of collect data about a seismic tomography experiment, inputs and model processing parameters, the design of referred to as a project, which is used to specify the ST’s back end is focused on reducing execution time of location of experimental output data and to control

87 the tomographic algorithm by using a high-performance 7. References computational platform and on a robust implementation of the tomographic algorithm. [1] J. A. Hole, “Nonlinear High-Resolution Three- Dimensional Seismic Travel Time Tomography,” Journal of Geophysical Research, 97, 1992 5. Future Work [2] J. Vidale, “Finite-difference calculation of travel times in three dimensions, Geophysics, 55, 1990. Future work on STg will be based on user feedback [3] J. Vidale, “Finite-Difference Calculation of Travel and impact of releasing the software to the seismic Times,” Bulletin of the Seismological Society of tomography community. However, it is envisioned that America, 78, 1998. availability of new computational platforms for the [4] P. R. C. da Silveira, C. R. S. da Silva, and R. M. back end and enhancement of the various processing Wentzcovitch. Metadata management for distributed steps of the tomographic algorithm will also justify first principles calculations in VLab—A collaborative future software releases. cyberinfrastructure for materials computation. September 2007 6. Acknowledgements [5] C. A. Zelt, P. J. Barton, “Three-dimensional seismic refraction tomography: a comparison of two methods applied to data from the Faeroe Basin. Journal of This material is based upon work supported in part Geophysical Research, 1998 by the National Science Foundation under Grants NSF Grant No. CNS- 0837556 CREST HRD–0734825, MRI–0923442, and [6] M. G. Averill, Ph.D., “A lithospheric investigation CNS–0837556. of the Southern Rio Grande Rift” The University of Texas at El Paso, 2007, 229 pages; AAT 3273992

88 Visualization of Inversion Uncertainty in Travel Time Seismic Tomography

Julio C. Olaya Faculty Mentor: Rodrigo Romero Cyber-ShARE Center for Excellence Department of Computer Science The University of Texas at El Paso, El Paso TX 79968, USA Emails: [email protected], [email protected]

1. Abstract deviation equal to picking error is commonly used, but a Cauchy distribution has also been used [1]. Seismic traveltime tomography is used to create velocity models for determining the velocity structure of the crust of the Earth. Using a tomographic algorithm Create or read Initial Velocity Model comprised of forward modeling and inversion, this While (iterative step needed) do paper presents an approach utilizing vectorized model For (each source)do Compute First Arrival Time Model coverage for visualizing uncertainty of inversion. Since For (each receiver & source) do the used tomographic algorithm does not perform Compute Ray Coverage Compute Velocity Perturbations model uncertainty computations, the algorithm is End of for executed within a Monte Carlo simulation to compute a Vel. Perturbation Initial Smoothing Vel. Perturbation Moving Average Smoothing sample of velocity models with arrival times perturbed Update Velocity Model according to a Cauchy probability density function. End of while Inversion uncertainty visualization can be used by Output Velocity Model geologists to determine where intermediate and final tomographic velocity models will have the greatest sensitivity to traveltime uncertainty. Figure 1. Vidale-Hole’s seismic tomography algorithm

For this work, travel-time uncertainty was modeled 2. Introduction using a Cauchy distribution and Monte Carlo simulation, which generate velocity uncertainty models Seismic travel time tomography is used to create more in agreement with expected uncertainty [2]. This velocity models that help to determine the velocity type of simulation can be used to compute the upper structure of the crust of the Earth. One important aspect bounds of velocity uncertainty and provide a qualitative of the computed models is the velocity uncertainty due assessment of model sensitivity to measurement travel- to experimental measurement errors of travel times [1]. time uncertainty. Using a tomographic algorithm designed and implemented by J. E. Vidale [7] and J. A. Hole [4] that

uses forward modeling and inversion to compute Create or read Initial Velocity Model seismic velocity models, this paper describes an effort Repeat N times to generate visualizations of inversion uncertainty, Compute Cauchy distributed error values Apply error values to input pick times which is the main contributor to velocity uncertainty. Execute Vidale-Hole’s algorithm The Vidale-Hole algorithm considers first arrival Determine the velocity model error End repeat travel times as perfectly accurate measurements. Thus, no uncertainty of travel times is included in the computations of either step of the algorithm. However, Figure 2. Monte Carlo Simulation input travel-time measurements can be perturbed to model measurement uncertainty according to some 2. Methods density function, and the algorithm can be executed as many times as necessary, considering travel-time inputs Vidale-Hole’s seismic tomography algorithm, as error-free measurements every time, to achieve a shown in Figure 1, was used as a compound desirable level of confidence in uncertainty computational step of the Monte Carlo simulation [5], calculations. A Gaussian distribution with standard as shown in Figure 2. The tomographic inversion code

89 was modified to generate vectorized coverage needed for inversion uncertainty visualization. Then, vectors were post-processed to create graphic files in an XML format that is readable by Paraview, a software application for data analysis and visualization [6]. Based on calculations for a 95% level of confidence done by Averill [1], N was set to 14 in the Monte Carlo simulation and the pick error was set to Figure 4. Shot point with uncertainty 0.150 s. Thus, the combined vectorized coverage shows 14 ray paths per source-receiver pair. The experimental Figure 5 uses the same projection and zoom value as data set used was produced by the Potrillo Volcanic those used in Figure 4. However, Figure 5 also includes Field experiment of 2003 [1], which used 8 shot points a visualization of coverage computed as cell hit-count and 793 receivers spaced at spans of 100 m, 200 m, and (shown in green). The latter is the reference coverage 600 m over a region with a length of 205 km. because it is computed with null uncertainty. Vectorized coverage shown indicates both ray path 3. Results uncertainty and a more complete visualization of coverage than cell hit count, as the color intensity of the Visualizations of inversion results are shown in latter is proportionally mapped to hit count and some of Figures 3 through 5. The former shows a combined the cells around the shot point have a very low value. visualization of standard hit-count final coverage output (shown in green) versus vectorized coverage of one of the experiment shot points with ray path uncertainty (shown in white). This figure provides qualitative confirmation of correlation between standard coverage output and output with uncertainty.

Figure 5. Vectorized and cell hit-count coverage around a shot point location

4. Conclusions

A visualization of inversion output from travel- time seismic tomography is generated to indicate inversion uncertainty due to travel-time uncertainty. Figure 3. Ray coverage with uncertainty Since inversion is used to generate each iteration of the velocity model, ray path uncertainty is directly Figure 4 is a close-up view of ray paths around the correlated to model uncertainty. Thus, this kind of shot point location where uncertainty of ray paths visualization can be used by geologists to determine caused by uncertainty of travel-time values is clearly where intermediate and final velocity models will have indicated. This figure, which is an orthographic the greatest uncertainty cause by travel-time projection along the Y axis, shows the variation of ray uncertainty. path locations as a cross-sectional view of a three- Future work may include control of visualization dimensional region that contains all the ray paths by source–receiver pairs to enable analysis of coverage connecting each receiver to each shot point. If uncertainty without interference from undesirably uncertainty were negligibly small, every receiver would rendered ray paths. In addition, representation of ray be connected to every shot point by a single ray path. path uncertainty regions as three-dimensional shapes This is shown by the top receivers on both sides of the may allow a better indication of bounds to ray path shot point. uncertainty and provide a more accurate indication of velocity model sensitivity to travel-time uncertainty.

90 5. Acknowledgements of the Earth and Planetary Interiors, Vol. 163 (1-4), pp. 245-250, 2007. This material is based upon work supported in part by 4. J. A. Hole, “Nonlinear high-resolution three- the National Science Foundation under grants CREST dimensional seismic travel time tomography,” HRD–0734825, MRI-0923442, and CNS-0837556. Journal of Geophysical Research, 97(B95), 1992, pp. 6553-6552. 5. P. Pinheiro da Silva, A. Velasco, M. Ceberio, C. Servin, M.G. Averill, N. Del Rio, L. Longpre, and 6. References V. Kreinovich, “Propagation and provenance of probabilistic and interval uncertainty in 1. M.G. Averill, A lithospheric investigation of the cyberinfrastructure-related data processing and Sourthern Rio Grande: Ph.D. Dissertation, The data fusion,” Proceedings of the International University of Texas at El Paso, 2007. Workshop on Reliable Engineering Computing 2. M. G. Averill, K. C. Miller, V. Kreinovich, and A. REC'08, Muhanna, R. L., and Mullen, R. L. (eds.), A. Velasco, “Viability of travel-time sensitivity February 2008, pp. 199-234. testing for estimating uncertainty of tomography 6. Paraview, http://www.paraview.org, Kitware Inc., velocity models: a case study,” submitted to Copyright 2008-2009. Geophysics, 2008. 7. J. E. Vidale, “Finite-difference calculation of 3. L. Boshi, J. P. Ampuero, D. Peter, P. M. Mai, G. traveltimes in three dimensions,” Geophysics, Soldati, and D. Giardini, “Petscale computing and 55(5), 1990, pp. 521-526. resolution in global seismic tomography,” Physics

91 Serious Games for 3D Seismic Travel Time Tomography

Ivan Gris Faculty Mentor: Rodrigo Romero Cyber-ShARE Center of Excellence University of Texas at El Paso El Paso, TX, 79968, USA Emails: [email protected], [email protected]

Abstract velocity models in real time, while engaging in motivating, learning experiences. Interactive three- Serious games and game technologies are commonly dimensional visualization facilitates the interpretation applied for training, education, and simulation of tomography results beyond the possibilities environments. In this paper, we present an approach that offered by standard orthogonal projections found in uses game development technologies to enable analysis of Earth crust velocity models generated with seismic travel hard-copy renderings of models. time tomography. We created a training application that was implemented as a serious game with a 3D interface 2. Game Engines facilitating exploration of tomographic processing outputs. A computer game is a goal-directed and The application features color-keyed extrapolation maps of tomographic experiments and real-time interactive competitive activity that may involve some form of visualizations that provide students and researchers an conflict [2]. A game induces independent decision environment to explore and learn about the details of making in users, as they must seek a successful computed three-dimensional velocity models and related approach to achieve objectives within the game tomographic models. Utilizing an open source game engine context [3]. Games fall into a wide spectrum of within a three-layer architecture, the developed serious genres, which are often combined to create games game provides a detailed model of a section of the crust of with the best elements of two of more genres. Thus, a the Earth using a three-dimensional extrapolated map. game of the serious game genre can be enhanced with 1. Introduction elements of the strategy and role-playing genres to create a well-rounded playing and learning As applications for computer simulation have experience for users, who are referred to as players. gained acceptance in many commercial, industrial, A distinguishing characteristic of serious games is and military areas, new applications have emerged. that they have an explicit and carefully thought-out These applications have been used for “serious” educational purpose and are not intended to be played training purposes as well as for education and primarily for amusement [3]. entertainment [1]. Many types of serious games have Good software design practice dictates that games, been implemented for educational purposes in almost which have complex calculation requirements, an every area that requires training. Using games to extensive array of features, or support for multiple, learn has proven to be effective in businesses and diverse input devices, be architected with a lower schools. Although games have been created to teach layer of software known as a game engine. A game within several scientific areas, serious games are engine abstracts the implementation details of tasks, often aimed at high school students or at proven such as rendering, physics modeling, and low-level experiments where the final solutions are known. input/output processing. Use of third-party game This paper presents a three-layer architecture for engines allows game developers (graphic designers, creating experimentally-driven 3D serious games. musicians, story writers, level designers, scripters, The produced game will assist geologists to create and programmers) to focus their efforts on the details more accurate models of the crustal velocity structure that make their games unique [4][5]. of the Earth. In addition, the game will help geology Game engines also facilitate rapid prototyping of students to understand the details of a travel-time conceptual ideas with computationally intensive seismic tomography algorithm and to create accurate simulation environments and complex game

92 dynamics. Typical engines include reusable operation without a prolonged learning curve. These components to implement model handling and game-inspired goals contrast with the lacking display, collision detection, physics, multi-device visualization capabilities, the text-based semi-batch input, graphical user interface, and artificial mode of execution, and the lengthy running time of intelligence. In contrast, the components that the underlying tomographic software, which as a comprise the content of the actual game include the whole discourage user experimentation with meaning behind object collisions, the responses to incremental model validation and evaluation of what- player input, and the way objects interact with the if scenarios. world within each level [4]. An ancillary type of The system design is based on the three-layered game engine, referred to as a middleware engine, architecture illustrated in Figure 1. The game engine specializes on specific functions and works in implements the bottom layer. The middle layer, conjunction with the main game engine. While where most games use a middleware engine, keeping within the frame budget, middleware engines implements the Vidale-Hole seismic tomography perform computationally intensive tasks with a algorithm. The top layer implements visualization higher degree of realism and a more varied and model manipulation control. The player’s main functionality than a main engine. Middleware engines goal is to adjust processing parameters until the typically implement physics, artificial intelligence, seismic tomography algorithm converges to a valid path finding, cloth simulation, destruction simulation, velocity model, while subgoals include qualitative and user interfaces [6]. correctness of intermediate results and a general trend of intermediate results toward velocity model 3. Seismic Travel Time Tomography convergence. Seismic travel time tomography is a technique used to investigate the velocity structure of the EDUCATIONAL LAYER Earth’s crust. The algorithm presented in this paper is Functions to support Game Mechanics an iterative application of forward modeling and non- Model Validation linear inversion designed and implemented by J. E. Vidale [7] and J. A. Hole [8], respectively. Forward MIDDLEWARE modeling uses the eikonal equation to compute a discrete three-dimensional first arrival traveltime Middleware Engine 3D Seismic Tomography model for each seismic wave source included in a tomographic experiment. The inversion procedure uses the forward modeling output to compute the GAME ENGINE source-to receiver wave propagation path and the first Game Engine jMonkey arrival time for each source and receiver pair of an Figure 1. Three-layered architecture of the serious experiment. The algorithm concludes when game for seismic tomography experimentally measured first arrival times and propagation times calculated during inversion agree 5. Game Implementation within root-mean-square values of measurement The bottom and the middle layers of the game are error. The final outputs of the tomographic algorithm fully functional third-party building blocks, which are include a three-dimensional velocity model and advantages of using a game engine and an existing seismic ray coverage of the model. implementation of a seismic tomography algorithm.

Thus, most of the implementation effort of the 4. Game Architecture serious game was dedicated to the top layer to meet The main goals of the serious game for seismic game design and implementation goals. tomography discussed in this paper are three- Depending on model size, the game offers dimensional display and manipulation of velocity interactivity and frame generation rates in the order models and associated tomographic models, of one frame per second, which compares favorably interactive visualization of intermediate and final with the alternative of manual display of the final models, and a minimalist feature set to enable system

93 velocity model after approximately an hour of north is determined by the user and is independent processing. In addition, visualizations are color- from the tomographic experiment layout. keyed to indicate model details such as cell hit count Using generation of model coverage visualization when rendering coverage, as shown in Figure 4, as an example, which is shown in Figure 3, every which compares favorably with respect to the usual time a primitive is aggregated to represent a model manually generated fixed 2D orthographic projection, cell, a red, green, or blue value is assigned to the gray-scale rendering of the same information. The primitive according to the number of seismic wave serious game implemented creates a rich 3D rays passing through it. Where there is a zero in the environment that is fully navigable in real time to model file, no rays travel through the cell and no support and encourage interactive model exploration. primitive is added to the visualization to avoid The game also implements a spatial extrapolation in cluttering coverage display. The game takes a several color schemes that provide a simple interface coverage binary file, translates cell hit values to for displaying ray concentration per cubic kilometer. integer form, and creates a primitive at the cell The game was written in Java using jMonkey as position with a color that maps to the cell hit count. the game engine to benefit from portability at the After the entire model is loaded, the user gets control source code, CPU architecture, and OS/GUI levels of the camera to explore the model in real time. [9]. jMonkey also provides a camera implementation, graphics primitives, networking capabilities, and Coverage Binary Output interactive frame-generation performance. Translated to Integer data.

Coverage Integer Data

Loop. For every value non equal to zero, display a cube in 3D space and assign a color.

User Interface Allow the user to explore the model by rotating and Figure 2. Representation of 3D space with arrows translating the camera. pointing to the positive direction. Figure 3. Implementation steps to visualize 3D model coverage. The top layer controls creation of the tomographic model 3D grids by traversing the model in slices and 6. Conclusions associating tomographic output information for each This paper describes the design and cubic kilometer. The seismic tomography algorithm implementation of a serious game to visualize, outputs one file per model which contains binary analyze, and control the outputs of an implementation values of model vertex or cell information according of first-arrival travel time seismic tomography. The to each model. Output models represent crustal game offers interactive, color-keyed, three- velocity information in a left-handed coordinate dimensional visualizations of seismic tomography system, but the positive Z direction, which outputs to encourage students and researchers to learn corresponds to depth, actually points up in model about and analyze the details of produced three- coordinates. Visualizations are based on a set of dimensional velocity models and related tomographic display transformations that result in crustal depth, models. the Z axis, as shown in Figure 2, correctly increasing The game enables model visualizations as output downward and starting from zero at the surface; is generated, which is in contrast with batch horizontal measurement along the X-axis increasing computations and subsequent manual visualization from left to right; and horizontal measurement along generation associated with standard use of the the Y axis increasing from front to back. Model X underlying seismic tomography software. Since the axis or Y axis orientation with respect to the Earth tomography software runs several iterations and may

94 take a few hours to complete for complex models, [7] J. E. Vidale, “Finite-Difference Calculation of real-time visualization of iteration outputs will speed Travel Times in Three Dimensions,” Geophysics, 55, up model exploration for qualitative validation and 1990. convergence assessment. [8] J. A. Hole, “Nonlinear high-resolution three- dimensional seismic travel time tomography, J. Geophys. Res., 97, 1992. [9] M. Roulo, “Java's Three Types of Portability” http://www.javaworld.com, 1997.

Figure 4. A sample ray coverage output.

Future work may enhance the game with the addition of visual clues of lack of model convergence, parallelization of the seismic tomography software, and selective control of visualized models.

Acknowledgements This material is based upon work supported in part by the National Science Foundation grants CREST HRD– 0734825 and MRI–0923442. Presentation of this paper is supported in part by NSF grant CNS–0837556.

References

[1] V. Narayanasamy, K. W. Wong, C. C. Fung, and S. Rai, “Distinguishing Games and Simulation Games from Simulators,” ACM Computers in Entertainment, Vol. 4, No. 2, April 2006. [2] L. Sauve, L. Renaud, and D. Kaufman, “Games and Simulations – Theoretical Underpinnings,” Proceedings of the Digital Games Research Association Conference, Vancouver, B.C., 2005. [3] C. Abt, “Serious Games,” 1970. [4] J. Ward, “What is a Game Engine,” http://www.gamecareerguide.com, as seen in 2008 [5] R.E. Pedersen, “Game Design Foundations” 2nd Edition, 2009. [6] “ AI, Cloth and Physics” Havok.com Inc, 2009.

95 Semantic Support for Research Project Reporting Maria Elena Ordonez Faculty Mentor: Ann Q. Gates Cyber-ShARE Center of Excellence The University of Texas at El Paso El Paso TX 79968, USA Emails: [email protected], [email protected]

and to support automated feeding to report- 1. Introduction generation tools. The objectives of the project are the following: An important part of a research-funded project is to report about participants, goals, • Define an ontology to capture information objectives, results, and other information about funded research projects. related to the project. An advantageous way to • Use the defined ontology to create a support reporting activities is through a Semantic Web portal. project Web site because information can be entered by authorized personnel from • Use the content of the Semantic Web anywhere in the world and at any time. In portal to generate annual reports. addition, posted information can be made 2. Background available immediately for authorized parties.

During the preparation of the Cyber- 2.1 Ontology Concepts ShARE end-of-the-year summary report, one An ontology to gather information about a of the main issues was information gathering project provides an efficient and effective way on accomplishments associated with to represent project information in a formal publications, research results, students funded, way [1]. Existing ontologies that could be and other data needed to meet reporting used for this purpose, however, tend to focus requirements. While project participants were on representing business projects rather than gathering information on their activities for research projects. Following the criteria of their diverse reporting responsibilities, the Jussopova-Mariethoz and Probst [2,3] to focus, type, and detail of maintained evaluate ontologies, a review of existing information were not consistent across all project ontologies revealed the following participant information-keeping efforts. This shortcomings about ontologies related to made generation of the annual report for the research projects: Center slow and labor-intensive, as information gaps had to be filled from • Granularity: Existing ontologies related to multiple sources and verified by multiple projects are too broad. parties. • Lack of administration perspective: Existing ontologies capture concepts The main goals of this project are to define related to research results, but they do not a systematic approach for collecting and incorporate administrative grant sharing information among participants of a information such as investigators and funded project, to facilitate posting of funding organization. semantically annotated data to a Web portal,

96 • Performance measurement and for information gathering based on project monitoring: Existing ontologies exclude concepts and information structure expressed information about evaluation reports made in terms of concept relations [1]. As a result of for funding agencies. numerous interviews with the Cyber-ShARE Center director and staff, project concepts and 2.2 Reporting Requirements their relations were defined. Examples of This project uses the reporting concepts are the following: requirements of the Cyber-ShARE Center of • Evaluation Excellence of The University of Texas at El Funding Organization Paso as a case study. The main requirements • are set by the National Science Foundation • Goals (NSF) Centers of Research Excellence in • Theme Science and Technology [4], which is the program that provided the funding to create • Funded Research Project the center. In addition, the center is a • Personnel participant of the NSF’s Innovation through Institutional Integration (I3) program that focuses on integrating related institutional Table 1. Symbols used to define relations projects as well as sharing resources of Symbol Description different research projects [5], and this project Generalization supports the efforts associated with that program. As a result the Cyber-ShARE Web Has a / Is an attribute of portal must reflect both independent work Is comprised of done as a center and collaborative work.

2.3 Semantic Portal Table 1 shows the relations that were defined in the ontology. Attributes, which Semantic Web technologies facilitate refer to the characteristics of each concept, information discovery. With the use of Drupal were also defined for each concept. The main [6] as the developing platform for the Web challenge is to differentiate attributes from site, the information posted to the site can be concepts. For example, Date of an Evaluation organized in a meaningful way, making it was made a concept rather than an attribute easier to find information and use tools. because of its importance and reuse. Besides enabling development of Web sites with a professional look and feel, Drupal also The initial definition of the concepts and enables participants from different projects, its relations are depicted in Figure 1. The who may not have any programming Evaluation concept is comprised of: Subject, experience, to create content and contribute to Data, Report, and Date. At the same time, the the Web site. Drupal supports the creation of Report concept is comprised of the entry forms that make it easier for users to Institutional and Project Evaluation concepts. input information. A project Goal is consists of zero to many instances of Objective. In order to achieve 3. Funded Research Ontology Design each Objective of a project, there must be at The first phase of this project involved the least one or many instances of Activity. An design of an ontology for funded research Activity can have one or many instances of projects. This ontology will provide a model Deliverable such as:

97

Figure 1: A draft ontology for managing information associated with research projects.

instances of Handbook, Journal, Conference Proceedings, Abstract, and Technical Report. • Publication All of these concepts are necessary to identify • Workshop the type of Publication that can result from an • Presentation Objective that achieves a Goal. • Conference Organized For each project, it is necessary to have • Patent information about individuals who participate in the research. The Personnel concept is • Proposal comprised of instances of Collaborator, • Partnership Consultant, Investigator, Staff, Advisory The Publication concept has attributes Board, and Student. The Student concept is Refereed and Submitted which provide comprised of Graduate and Undergraduate. information about the current state of any publication, e.g., denoting if the publication Conclusions was submitted, accepted, or published. In This paper presents the design of an addition, Publication can be comprised of ontology to represent, capture, and share information related to a funded research

98 project. The design of the ontology focuses on 5. National Science Foundation Innovationa supporting information gathering requirements through Institutional Integration Program, to satisfy the reporting obligations of NSF- http://www.nsf.gov/pubs/2010/nsf10007/n funded research projects, but it can be sf10007.jsp expanded to meet the reporting requirements 6. Drupal Documentation: www.drupal.org of other funding agencies. Future work of this project will consist of the implementation of the ontology into a semantically-enabled content management system such as Drupal, support for automatic generation of annual reports, and potential expansion to support meeting the reporting needs of additional funding agencies. Acknowledgements I would like to acknowledge the support given by Dr. Ann Gates, Dr. Rodrigo Romero, Leonardo Salayandia and Patricia Reyes. This work was partially supported by NSF grants CNS- 0837556 and CREST HRD-0734825.

References 1. Noy, N. F., and Mcguiness, D., 2001, "Ontology Development 101: a Guide to Creating Your First Ontology,'' Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001- 0880, March 2001. 2. Sharman, R., and Kishore, R., 2004, “Computational Ontologies and Information Systems II: Formal Specification,” Communication of the Association for Information Systems, Springer, New York, 2004, vol. 14, pp.184-205. 3. Jussupova-Mariethoz, Y., and Probst, A., 2006,”Business Concepts Ontology for an Enterprise Performance and Competences Monitoring”. Computers in Industry, vol. 58, pp. 118–129. 4. National Science Foundation Fastlane, https://www.fastlane.nsf.gov/NSFHelp

99 Advanced Wireless Mesh Networks: Design and Implementation

Julio Castillo, MS Graduate Student UPRM, [email protected] Kejie Lu, Assistant Professor UPRM, [email protected] Department of Electrical and Computer Engineering University of Puerto Rico at Mayagüez, PR 00681-9048

Abstract and self-organizing. Nowadays, there are many on- Wireless Mesh Networks (WMNs) have emerged as a going research projects about WMNs in different fundamental technology for the next-generation of universities and research labs such as MIT Roofnet [2], Wireless Networking. They are formed by Mesh Routers USCB MeshNet [3], Microsoft Research[4] and and Mesh Clients. Mesh Routers have minimal mobility CitySense [5]. Those projects are building mesh and usually do not have energy constraints. On the platforms based on off-the-shelf products and develop other hand Mesh Clients are mobile. They are self- demanding applications and services. forming, self-healing and self-organizing. Their easy However, the design and implementation of configuration and deployment make them an WMNs have to deal with some challenges issues such economical, reliable and simple solution that can be as architecture and protocol design, MAC and routing implemented anywhere at any time. They have been protocols, resource allocation and scheduling, network applied to many areas such as education, government, capacity, cost optimization, manageability, cross-layer municipalities and industries. The purpose of this design, and security. Thus, existing algorithms and project is to design and implement an advanced WMN protocols ranging from the physical layer to the indoor and outdoor testbed using the MAC Layer application layer need to be enhanced or re-invented for Routing Protocol (MACRT) and apply it in the WMNs. Development of a Versatile Service-Oriented Wireless Mesh Network project (VESO-MESH). The analysis, 2. Background design and implementation will be done using commercial off-the-shelf (COTS) hardware and free 2.1 Wireless Mesh Networks software. We used Linux distributions as the Operating System, a modified Madwifi as the wireless driver, WMNs have emerged as an important technology miniPCI and PCMCIA cards with Atheros-based for next generation wireless networks. WMN is chipset, and finally a PDA, laptops, PCs, and AOpens currently under development by the IEEE 802.11 Task computers as the mesh clients and mesh routers Group. They can be seen as a type of Mobile Ad Hoc respectively. Networks (MANETs). WMNs are multi-hop wireless networks formed by mesh routers and mesh clients. 1. Introduction The mesh routers have minimal mobility and do not have energy constraints. They form a wireless mesh Over the past years, wireless networks have been backbone among them. They forward the packets applied in many environments such as education, received from the mesh clients to the gateway router. government, municipalities and industries. In some The gateway router generally is connected to the cases, the combination of wireless networks with wired Internet through a wired backbone, and permits the networks empowers the environments with scalability, interconnection of ad hoc, sensor and cellular networks flexibility, and mobility. Actually, Wireless Networks to the Internet. (See Figure 2.1) technology continues increasing in popularity around the world due to they bring benefits such as mobility, low cost, easy and simple installation and time saving. A particular technology called Wireless Mesh Networks (WMN) is part of the wireless networks world. This technology marked the divergence from the traditional centralized wireless systems such as wireless local area networks (WLANs). WMNs are formed by mesh routers and mesh clients [1]. Mesh routers have minimal mobility and do not have energy constraints,

but mesh clients are mobile and energy constrained. Those types of networks are self-forming, self-healing Figure 2.1: WMN Architecture [6].

100 There also are many applications and standards metric, the dynamic links break detection, and the client where WMNs are being considered, applications such management. The mesh routers forward frames in the as community, enterprise networks, defense systems, MAC layer, and thus the mesh routers and the clients emergency, intelligent transport system networks, are in the same subnet. Besides, only mesh routers need surveillance systems; and standards such as IEEE to run the protocol. MACRT uses the Multi-band 802.11, IEEE 802.15, IEEE 802.16, and IEEE 802.20. Atheros Driver for Wi-Fi (Madwifi) [10] as the WLAN However, the development of WMN has to deal driver. Madwifi is based for Atheros chipsets [11]. The with some challenging issues in different layers of the Madwifi driver supports the five following operation protocol stack such as architecture and protocol design, modes: STA, AP, IBSS (Ad-Hoc), Monitor and WDS MAC and routing protocols, resource allocation and (bridge). It supports PCI, miniPCI and Cardbus devices. scheduling, network capacity, cost optimization, Mesh routers on MACRT use Atheros 802.11 wireless manageability, cross-layer design, and security. In chipsets with the Madwifi driver. addition, existing algorithms and protocols at each layer Figure 2.2 depicts the principal components of need to be enhanced or re-invented for WMNs. MACRT such as: route discovery, link break, and client roaming, Figure depicts the MACRT’s components. 2.2 Routing Protocols

Routing protocols specifies how mesh nodes communicate with each other to disseminate information that allows them to select routes between any two nodes on a network. Routing protocols exchange information such as topology, load distribution, link quality among nodes of the network, discover and maintain routes/paths to network nodes, finding routes to a new network node and responding Figure 2.2. Components of the MACRT routing protocol [9]. when a route breaks. The major objective of a routing protocol for WMNs is to determine high-throughput 3. The WMN Testbeds routes between nodes. On the other hand, rounting metrics are values used 3.1 WMN Testbed based on PCMCIA Cards by routing protocols to select one route over another. Usually, reflect communication cost in terms of The design and implementation of the WMN required bandwidth, achievable delay, hop count, Testbed based on PCMCIA cards uses the MACRT incurred load, reliability, energy consumption, etc. routing protocol. The hardware and software used by There are many metrics for WMNs in the literature this testbed were: such as Hop-Cont, Per-Hop RTT, ETX [7], WCETT, MIC, etc. 3.1.1 Hardware Used

2.2.1 Ad Hoc On-Demand Vector Routing (AODV) • Dell Precision Workstation 360 • Cisco Aironet 802.11a/b/g Wireless PCI AODV is a very popular routing protocol for Adapter MANETs. It is a reactive routing protocol; the routes are created only when they are needed. It has been • Laptop Dell Latitude D620 standardized in the IETF as experimental RFC 3561 • Netgear WAG511 PCMCIA card [8]. This protocol is loop-free and avoids the counting • D-Link WNA-1330 PCMCIA card to infinity problem by the use of sequence numbers. It • Pocket PC Dell Axim X51v offers quick adaptation to mobile networks with low processing and low bandwidth utilization. It uses a 3.1.2 Software Used simple request–reply mechanism for the discovery of routes. • Ubuntu Linux v8.04.3

Kernel v2.6.22 2.2.2 MAC Layer Routing Protocol (MACRT) o o gcc v4.2 MACRT [9] is a new routing protocol inspired by • Madwifi v0.9.3 both AODV and STP routing protocols. It resides on • MACRT routing protocol the MAC Layer of the protocol stack. It is based on AODV with improvements that including the ETX

101 3.1.3 Configuration • 5.5 dBi Rubber Duck Omni Antenna with RP-SMA connector. A) Topology • Compact Flash (CF )Card Reader Adapter

3.2.2 Software

• Debian Linux v5.0 o Kernel v2.6.22.19 o gcc v4.2.4 • Madwifi v0.9.3 • MACRT routing protocol Figure 3.1: WMN Testbed based on PCMCIA cards Topology. 3.2.3 Configuration B) Mesh Routers A) Topology In Figure 3.1, N1, N2 and N3 work as mesh routers. Each mesh router is configured following the next steps: • Install Ubuntu Linux Distribution v8.04.3 on the devices that will work as mesh routers. o Compile and install kernel v2.6.22 o Compile and install gcc v4.2 • Install MACRT routing protocol

o Compile and install kmacrt o Compile and install Madwifi v0.9.3 Figure 3.2: WMN testbed based on Alix boards Topology.

o Compile umacrt Run the scripts to create both interfaces B) Mesh Routers o ath0 and mesh1. In Figure 3.2, Alix boards are being used as mesh routers. They have some limitations on their hardware. C) Mesh Clients For that reason, it is not convenient install a complete Linux Distribution on it. We need to install the minimal Mesh Clients should have to be at least one packages to get a better performance. Each Mesh wireless 802.11 a/b/g/n interface. There are 2 forms to Router is configured following the next steps: configure the Mesh Clients. We can assign them an IP address by DHCP, or we can do it by setting static IP Laptop Configuration addresses.

 Install Debian Linux Distribution v5 3.2 WMN Testbed based on Alix boards  Create a .deb kernel v.2.6.22 Package The design and implementation of the WMN  Compile and install kernel v2.6.22 Testbed based on Alix boards uses the MACRT routing  Compile and install gcc v4.2. protocol. The hardware and software used by this  Install some packages useful to install MACRT. testbed were:  Install MACRT Compile and install kmacrt. 3.2.1 Hardware o Compile and install Madwifi v0.9.3 • Alix 2D2 board o Compile umacrt • AC/DC 15V 1.25A 18W Switching Power o Supply Adapter CF Card configuration • Indoor ALIX.2D2 Enclosure • 4GB Compact Flash card  Install Debian Linux Distribution v5. • Wistron miniPCI CM9 802.11 a/b/g  Install the .deb kernel v2.6.22 package previously • RPSMA Pigtail created on the CF card.

102  Configure some parameters such as the network, we put all the necessary steps to achieve the successful TTYs, grub, modules to be loaded, etc. configuration. At the same time, we are working in the design and implementation of the VESO-MESH Alix board Configuration Testbed that will serve as a platform to develop new routing algorithms and protocols in the near future.  Import all the .ko files from the laptop to the Alix board to the /lib/modules/2.6.22.19/net folder. Acknowledgements  Run the scripts to create both interfaces ath0 and mesh1. This work is supported by NSF MRI Grant - Award Number 0922996. Presentation of this poster was supported in part by NSF Grant CNS- 0837556. C) Mesh Clients

Mesh clients are configured similarly as the References previous testbed by DHCP or by setting static IP addresses. [1] I. Akyildiz and X. Wang, “A survey on wireless mesh networks,” IEEE Communications 4. VESO-MESH Testbed Magazine, vol. 43, 2005, pp. S23-S30. [2] J. Bicket, D. Aguayo, S. Biswas, and R. Morris, The VESO-MESH Project [12] has as the main “Architecture and evaluation of an unplanned objective develop a versatile service-oriented wireless 802.11b mesh network,” Proceedings of the 11th mesh network (WMN), that can be quickly built to annual international conference on Mobile respond to natural disaster, establishes a data intensive computing and networking, Cologne, Germany: environmental monitoring application specifically ACM, 2005, pp. 31-42. addressing hurricanes and earthquakes. [3] “UCSB MeshNet.” VESO-MESH Testbed is being designed and [4] “Self Organizing Wireless Mesh Networks - implemented as a first step to achieve the VESO-MESH Microsoft Research.” Project objectives. VESO-MESH Testbed is based on [5] R.N. Murty, G. Mainland, I. Rose, A.R. the two WMN Testbeds previously configured. Figure Chowdhury, A. Gosain, J. Bers, and M. Welsh, 4.1 depicts the topology. “CitySense: An Urban-Scale Wireless Sensor Network and Testbed,” 2008 IEEE Conference on Technologies for Homeland Security, Waltham, MA, USA: 2008, pp. 583-588. [6] I. Akyildiz and X. Wang, Wireless Mesh Networks, Wiley, 2009. [7] D.S.J.D. Couto, D. Aguayo, J. Bicket, and R. Morris, “A high-throughput path metric for multi-hop wireless routing,” Wirel. Netw., vol. 11, 2005, pp. 419-434. [8] C. Perkins, E. Belding-Royer, and S. Das, “Ad hoc On-Demand Distance Vector (AODV) Routing,” 2003. [9] W. Zhao, “A New MAC Layer Routing Protocol for Infrastructure Wireless Mesh Networks,” Technical University of Denmark, DTU, 2008. Figure 4.1: VESO-MESH Testbed Topology. [10] “madwifi-project.org - Trac.”

[11] “Atheros, wireless local area networking, 4. Conclusions wireless, WLAN” [12] “MRI: Development of A Versatile Service- In this work we designed and implemented Oriented Wireless Mesh Network for Disaster successfully two testbeds based on PCMCIA Cards and Relief And Environmental Monitoring in Puerto Alix Boards using commercial off-the-shelf (COTS) Rico.” hardware and free software such as Laptops, PCs, Cards based on Atheros chipset, Madwifi driver and Ubuntu and Debian Linux Distributions. We showed all the hardware and software used per each testbed. Besides,

103 STUDENT POSTERS

(available online @ cahsi.org/4thannualposters)

1. Multi-Agent Simulation using Distributed Linux Cluster (Nathan Nikotan, CSU-DH)

2. Using Video Game Concepts to Improve the Educational Process (Daniel Jaramillo, NMSU)

3. Finding Patterns of Terrorist Groups in Iraq: A Knowledge Discovery Analysis (Steven Nieves,UPR-Politécnic)

4. Morphological Feature Extraction for Remote Sensing (José G. Lagares, UPR-Politécnic)

5. Unsupervised Clustering of Verbs Based on the Tsallis-Torres Entropy Text Analyzing Formula (Gina Colón, UPR-Politécnic)

6. Document Classification using Hierarchical Temporal Memory (Roxana Aparicio, et al., UPR-M)

7. Low Power Software Techniques for Embedded Real Time Operating Systems (Daniel Mera, et al., UPR-M)

8. Object Segmentation in Hyperspectral Images (Susi Huamán, et al., UPR-M)

9. Hyperspectral Texture Synthesis by 3D Wavelet Transform (Néstor Diaz, et al., UPR-M)

10. Routing Performance in an Underground Mine Environment (Joseph Gonzalez, UPR-M)

11. Leveraging Model Fusion to Improve Geophysical Models (Omar Ochoa, et al., UTEP)

12. Materialography Using Image Processing with OpenCV and C# (Oscar Alberto Garcia, et al., UPR-M)

13. Network Security: A Focus in the IEEE 802.11 Protocols (Brendaliz Román, UPR-Politécnic)

14. Towards a Systematic Approach to Create Accountable Scientific Systems (Leonardo Salayandia, UTEP)

15. A Specification and Pattern System to Capture Scientific Sensor Data Properties (Irbis Gallegos, UTEP)

16. An Assistive Technology Tool for Text Entry based on N-gram Statistical Language Modeling (Anas Salah Eddin, et al., FIU)

17. Detecting the Human Face Vasculature Using Thermal Infrared Imaging (Ana M. Guzman, et al., FIU)

18. An Incremental Approach to Performance Prediction (Javier Delgado, FIU)

19. Using Genetic Algorithms for Scheduling Problems (Ricardo Macias, FIU)

20. Semantic Support for Weather Sensor Data (Jessica Romo, et al., UTEP)

104 STUDENT POSTERS (CONT’D)

21. STg: Cyberinsfrastructure for Seismic Travel Time Tomography (Cesar Chacon, et al., UTEP)

22. Visualization of Inversion Uncertainty in Travel Time Seismic Tomography (Julio C. Olaya, UTEP)

23. Serious Games for 3D Seismic Travel Time Tomography (Ivan Gris, UTEP)

24. Semantic Support for Research Projects (Maria E. Ordonez, UTEP)

25. Advanced Wireless Mesh Networks: Design and Implementation (Julio Castillo, UPR-M)

26. Supporting Scientific Collaboration Through the Flow of Information (Aída Gándara, UTEP)

27. Caching to Improve Provenance Visualization (Hugo D. Porras, et al., UTEP)

28. Prospec 2.1: A Tool for Generating and Validating Formal Specifications (Jesus Nevarez, UTEP)

29. Checking for Specification Inconsistencies: A Prospec Component (Jorge Mendoza, et al., UTEP)

30. Integrating Autodesk Maya to Microsoft XNA (Carlos R. Lacayo, UH-D)

31. VizBlog: From Java to Flash Deployment (Joralis Sánchez, UPR-M)

32. Entropy Measures Techniques to Characterize the Vocalizations of Synthetic and Natural Neotropical Anuran (Marisel Villafañe, et al, UPR-M.)

33. Predicting Survival Time From Genomic Data (María D. González Gil, UPR-M)

34. WIMS Cochlear Implant: Support to Test Electrode Array (Wilfredo O. Cartagena-Rivera, et al., UPR-M)

35. Hyperspectral Image Analysis for Abundance Estimation using CUDATM (Amilcar González, et al., UPR-M)

36. iPhone-based Digital Streaming Data Exchange for Species Classification in a Mesh Wireless Sensor Network (Nataira Pagán, et al., UPR-M)

37. Genetic Sequence Editor (Rey D. Sánchez, UTPA)

38. A Web-based User Interface for the Time-Frequency Representation of Environmental Bio-acoustics Signals (Laura M. Matos, et al., UPR-M)

39. KPAG Software, from Kronecker Product Algebra to Graphs (Richard Martínez Sánchez, UPR- RP)

40. Parallax: A Progress Indicator for MapReduce Pipelines (Kristi Morton, et al., U of Washington) 41. Visual Comparison Tool for Aggregate Data (Hooman Hemmati, et al., UH-D)

105 STUDENT POSTERS (CONT’D)

42. Teaching Entry-Level Programming concepts to Aspiring CS Majors through RPG Maker VX Games (David Salas, NMSU)

43. Video Game Design and Development Using the G.E.C.K (Bretton Murphy, UH-D)

44. Text Visualization Using Computer Software (Rafael Cruz Ortiz, UH-D)

45. Using Panda3D to Create 3D Games (Jeremiah Davis, NMSU)

46. Tracking Moving Objects Using Two Synchronous Cameras (Diego Rojas, TAMU-CC)

47. GPU Programming: Transferring the Data or Recomputing in a Many-body Problem (Axel Y. Rivera Rodríguez, UPR-H)

48. A Comparison of Text based languages and Visually based IDEs used for creating Computer Games (Richard G. Trujillo, UPR- RP)

49. XBOX Live: Could you be at Risk While Playing Online? (Jeremy Cummins, et al.)

50. Hyperspectral Image Processing Algorithms for Cancer Detection on CUDA/GPUs (Christian Sánchez-López, et al., UPR-M)

51. Performance of Routing Protocols in Unmanned Aerial Vehicles (Deisi Ayala, CSU-DH)

106 SPEAKER BIOGRAPHIES

Dr. Malek Adjouadi

Malek Adjouadi is currently a Professor with the department of Electrical and Computer Engineering (ECE) at Florida International University. He is the founding director of the Center for Advanced Technology and Education (NSF-CATE) funded by the National Science Foundation since its inception in 1993. Dr Adjouadi has also led the efforts in establishing the joint Neuro-engineering program between FIU and Miami Children’s Hospital in pediatric epilepsy. His research interests are in image/signal processing, applications of neuroscience, and assistive technology research focusing on visual impairment and severe motor disabilities.

Dr. Cecilia Aragon

Dr. Cecilia R. Aragon has been a Staff Scientist in the Computational Research Division at Lawrence Berkeley National Laboratory since 2005, after earning her Ph.D. in Computer Science from UC Berkeley in 2004. She earned her B.S. in mathematics from the California Institute of Technology. Her current research focuses on the uses of visualization in scientific collaborations, and she is interested in how social media and new methods of computer-mediated communication are changing scientific practice. She has developed novel visual interfaces for collaborative exploration of very large scientific data sets, and has authored or co-authored over 30 peer-reviewed publications and over 100 other publications in the areas of computer-supported cooperative work, human-computer interaction, visualization, visual analytics, image processing, machine learning, and astrophysics. She has received many awards for her research, including the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2008 and 4 Best Paper awards since 2004. She was recently named one of the Top 25 Women of 2009 by Hispanic Business Magazine.

Dr. Mohsen Beheshti

Mohsen Beheshti is the Chair and Professor of Computer Science Department at California State University - Dominguez Hills. His research interests include Network Security, Data Conversion, Multidisciplinary research, and Curriculum Development. He is the director of Center for Excellence in Knowledge Management and Computational Science (CECS) to promote research and education for the college. He is also the director of Computer Science Research Lab (CSRL) conducting research in Intrusion Detection system and Data Mining in collaboration with other faculty and undergraduate/graduate students. He has developed new graduate and undergraduate programs to better prepare students to the workforce and in their further studies. He has extensive experience in developing supervised/closed laboratories and has developed course and lab manuals. Dr. Beheshti has received many grants to advance technology in education and has hosted series of workshops for high school teachers to further promote area of computer science to high school students. He is also a member of IEEE, ACM, Sigma Xi, CAHSI, RISSC and SACNAS.

Dr. Sarah Burgstahler

Dr. Sheryl Burgstahler directs DO-IT (Disabilities, Opportunities, Internetworking and Technology) and Accessible Technology at the University of Washington. DO-IT promotes the success of students with disabilities in postsecondary programs and careers. Along with Professor Richard Ladner, she directs AccessCareers project which is funded by NSF to increase the participation and success of individuals with disabilities in computing and IT fields

Bruce C. Edmunds, M.Ed.

Since 2009, Bruce Edmunds has been the Program Manager for The Computing Alliance of Hispanic Serving Institutions (CAHSI) and Scaling and Adapting CAHSI Initiatives (SACI) an NSF extension grant focused on implementing CAHSI initiatives in newly adopted CAHSI Institutions. He has served as a reviewer for The NCWIT Award for Aspirations in Computing. Mr. Edmunds received a Masters in Educational and Instructional Technology specializing in Curriculum Design from the University of Texas at El Paso in 2000. Prior to serving as Program Manager for CAHSI, his educational experience included teaching Computer Science at Cathedral High School in El Paso, Texas where he also served as Technology Coordinator, and Curriculum Development Department Chair.

107

SPEAKER BIOGRAPHIES

Dr. John Fernandez

Dr. John D. Fernandez is Chair of the Department of Computing Sciences, which includes Computer Science, Engineering and Geographic Information Sciences. He is a Professor and an Endowed Chair for Computer Science, Executive Member of the NSF BPC Computing Alliance of Hispanic Serving Institutions, and PI or Co-PI on several other federal and state grants, securing $4.5 million dollars of grant funding in five years. He was awarded the 2008 Maestro Award by the Society of Mexican American Engineers and Scientists and was voted Vice Chairman of the Corpus Christi Digital Community Development Corporation. Dr. Fernandez has a Ph.D. from Texas A&M University and an M.S. from West Virginia University. He completed his B.A. in mathematics at Texas A&M –Kingsville. Dr. Fernandez is a native South Texan who spent 20 years in the U.S. Air Force, finishing his career at the Pentagon while serving as the Assistant for Computer Science to Dr. David S. C. Chu, Assistant Secretary of Defense for Program Analysis and Evaluation. Dr. Fernandez held several executive positions in industry, including serving as CEO of Operational Technologies Corporation, an engineering consulting firm with government and business contracts. His research interests are in software engineering, HCI and CS education. Working on programs for women and minorities to become computer science professionals is his greatest passion.

Paco Flores

Paco Flores is the Program Coordinator for Promotion and Outreach at the Hispanic Scholarship Fund (HSF). In this role, he speaks to students and administrators in individual and mass communications to ensure that Latinos around the country are aware of the opportunities provided by the Hispanic Scholarship Fund and the Gates Millennium Scholars (GMS) Program. Paco also heads a Gates initiative geared towards increasing the college enrollment rate of minority male students. Paco graduated from the University of California, Santa Barbara in Psychology and History. Born and raised in California, he lives and works in San Francisco.

Dr. Gilda Garreton

Gilda Garretón is a principal engineer at SunLabs/Oracle and her main research focuses on VLSI CAD algorithms. Since 2008, she is investigating alternative methods to multi-thread VLSI tools. Previous to SunLabs, Gilda researched on 2D/3D mesh generation algorithms suitable for device and process simulation. Gilda is an Open Source advocate and a Java/C++ developer. She received her B.A. and Engineering degree with honors from the Catholic University of Chile (PUC) and her Ph.D. from the Swiss Institute of Technology, Zurich (ETHZ). She joined Sun Microsystems Laboratories in 2004, an Oracle division as of February 2010. Previously to SunLabs she worked as a consultant at the Catholic University of Chile and as IT analyst/project manager at UBS, a Swiss bank in Zurich and Stamford, CT. Gilda has been a mentor at Sun Microsystems and MentorNet and she co-founded in 2006 the community Latinas in Computing (LiC) whose main goal is to promote leadership and professional development among Latinas in the field. In 2009, Gilda and Latinas in Computing were honored by Caminos Pathways and the City of San Francisco for their diversity work.

Dr. Ann Gates

Dr. Ann Quiroz Gates is the Associate Vice President of Research and Sponsored Projects at the University of Texas at El Paso and past chair of the Computer Science Department. Her research areas are software property elicitation and specification, and workflow-driven ontologies. Gates directs the NSF-funded Cyber-ShARE Center that focuses on developing and sharing resources through cyber-infrastructure to advance research and education in science. She was a founding member of the NSF Advisory Committee for Cyberinfrastructure, and she serves on the Board of Governors of IEEE-Computer Society. Gates leads the Computing Alliance for Hispanic-Serving Institutions (CAHSI), an NSF-funded consortium that is focused on the recruitment, retention, and advancement of Hispanics in computing and is a founding member of the National Center for Women in Information Technology (NCWIT), a national network to advance participation of women in IT. Gates received the 2009 Richard A. Tapia Achievement Award for Scientific Scholarship, Civic Science, and Diversifying Computing and was named to Hispanic Business magazine’s 100 Influential Hispanics in 2006 for her work on the Affinity Research Group model that focuses on development of undergraduate students involved in research.

Dr. Bradley Jensen

Dr. Bradley Jensen is a Senior Academic Relationship Manager with Microsoft Corporation. He brings a wealth of extensive relationship, teaching, research, product marketing, sales, and domestic/international executive experience. He has demonstrated capabilities in Curriculum Development, Course Design and Delivery, Research, Consulting, Manufacturing, Document Management, Publishing, Telecommunications, Aerospace, Legal, and Commercial Property Management. Dr Jensen's proven accomplishments are in: Information Security, Software Development, Project Management, Strategic Alliances, E-Commerce, Strategic Marketing, P & L Management, Team Building and Consultative Selling. The following are some of his key milestones at Microsoft Corporation: Established an international university consortium for use of large (terabytes) live datasets from several Fortune 100 corporations for use in the classroom and in research. Co-chair of Board of Directors; Converted several courses from competing technology to Microsoft technology; Improved relations at managed universities. Working with administration and faculty to improve perception and use of Microsoft technology; Established Critical Infrastructure Protection modules for use by faculty around the world; Co-chair Board of Directors for STEM and Texas Business and Education Consortium; Academic Relationship Manager of the year for 2006.

108 SPEAKER BIOGRAPHIES

Dr. Henry Jerez

Dr. Henry Jerez holds a Ph.D. in Computer Engineering from the University of New Mexico where he performed research in the area of distributed Digital Object repositories, load balancing and wireless communications. Dr. Jerez is a Senior Program Manager in the extreme computing group in Microsoft Research where he works on Distributed System Security and Cloud Computing. Previously he worked at CNRI as a Senior Research Scientist, where he created the TNA network and was the technical lead for the ADL-R CORDRA project. He has contributed to the area of digital libraries and Digital Object repository architecture as a member of the Prototyping team for the Research library at Los Alamos National Laboratory where he worked from 2002 to 2005 performing research and development in the new architecture for the LANL digital object architecture. Dr. Jerez has served as faculty member, conference reviewer, enterprise consultant and international advisor to several companies and universities across Latin and North America.

Dr. Patty Lopez

Dr. Patty Lopez received her BS, MS, and PhD degrees in computer science, spent 19 years as an Imaging Scientist for HP, and now works on microprocessor logic validation for Intel in Fort Collins, Colorado. She has released over fifty imaging products and received seven imaging patents. In 2003, she was named a Distinguished Alumna for the New Mexico State University College of Arts and Sciences. Patty currently serves as VP of Women at Intel employee group, a MentorNet mentor, a First Lego Robotics League coach, and an industrial advisory board member for several programs at NMSU. She joined the Computing Alliance of Hispanic Serving Institutions (CAHSI) Board in 2010, and represents Intel on the 2010 Grace Hopper Celebration of Women in Computing Industrial Advisory Board. She is a founding member and co- chair of Latinas in Computing, a grassroots organization whose mission to promote and develop Latinas in technology.

Dr. Dina Requena

Dr. Requena currently works as a Senior Portfolio Launch Manager in IBM Systems Technology Group. She manages the launches and go to market strategy including launch budget for the BladeCenter and System x product lines. Prior to this, she worked as Senior Program Manager in the Systems Technology Group where she managed various system management software and solutions implementations. Prior to her current role, Dina worked in the IBM CIO Business Enterprise Information organization where she led the IT strategy and implementation of worldwide projects that generate multimillion dollars of revenues for IBM by enhancing business operations performance. Dr. Requena has been a part of divisions such as the Retail Store Solutions, Integrated Supply Chain and IBM Corporate CIO. Prior to IBM, she worked as a Consultant for the World Bank, Process Engineer for a Peruvian Confectionery company, and as a Teaching and Research Assistant at the Universities where she obtained her graduate degrees. She was the founding president of SHPE North Carolina professional chapter founded in 2004 and founding Treasurer for the SHPE North Carolina State University student chapter, founded in 1995. Dr. Requena teaches the MEM course Managing Product Development.

Dr. Steven Roach

Steve Roach received is Ph.D. in Computer Science from the University of Wyoming in 1997. He has 12 years of industrial software development experience in data acquisition, process control, and chemical process modeling as well as software development for NASA’s Cassini mission and the International Space Station. He has been using cooperative learning and the Affinity Research Group model in his courses and research since 1999. In 2002 and 2003, he chaired the IEEE CCSE Sub-Committee on Advanced Software Engineering Curricula. The CCSE is an international organization developing models of undergraduate and graduate software engineering programs. In 2003, he chaired the panel session “The Art of Getting Students to Practice Team Skills,” at the 33rd ASEE/IEEE Frontiers in Education Conference (with E. Villa, J. Sullivan, R. Upchurch, and K. Smith). He is an IEEE-CS Certified Software Development Professional and a program evaluator for the Computing Accreditation Committee of ABET. He served as a reviewer for SWEBOK and for the Graduate Software Engineering Curriculum 2009. He is the Associate Chair of Computer Science at the University of Texas at El Paso.

Dr. Nayda Santiago

Nayda G. Santiago received the B.S.E.E. degree from University of Puerto Rico, Mayaguez Campus, in 1989, the M.Eng.E.E. Degree from Cornell University in 1990, and the Ph.D. degree in Electrical Engineering from Michigan State University in 2003. She is Associate Professor of the University of Puerto Rico, Mayaguez Campus at the Electrical and Computer Engineering Department. She is Co-PI of the Femprof program, works with CAHSI to disseminate the Affinity Research Group (ARG) model. Nayda has been recipient of the 2008 Outstanding Professor of Electrical and Computer Engineering Award, 2008 Distinguished Computer Engineer Award of the Puerto Rico Society of Professional Engineers and Land Surveyors, the 2008 HENAAC Education Award, and the 2009 Distinguished Alumni Award of the University of Puerto Rico, Mayaguez Campus. She is a member of the IEEE and the ACM.

109 SPEAKER BIOGRAPHIES

Dr. Valerie Taylor

Valerie E. Taylor earned her B.S. in Electrical and Computer Engineering and M.S. in Computer Engineering from Purdue University in 1985 and 1986, respectively, and a Ph.D. in Electrical Engineering and Computer Science from the University of California, Berkeley, in 1991. From 1991-2002, Dr. Taylor was a member of the faculty in the Electrical and Computer Engineering Department at Northwestern University. Dr. Taylor joined the faculty at Texas A&M University as Head of the Dwight Look College of Engineering’s Department of Computer Science in January of 2003, and is, also currently a holder of the Royce E. Wisenbaker Professorship II. Her research interests are in the areas of computer architecture and high performance computing, with particular emphasis on mesh partitioning for distributed systems and the performance of parallel and distributed applications. She has authored or co authored over 80 papers in these areas. Dr. Taylor has received numerous awards for distinguished research and leadership, including the 2002 IEEE Harriet B. Rigas Award for woman with significant contributions in engineering education, the 2002 Outstanding Young Engineering Alumni from the University of California at Berkeley, the 2002 Nico Habermann Award for increasing the diversity in computing, and the 2005 Tapia Achievement Award for Scientific Scholarship, Civic Science, and Diversifying Computing. Dr. Taylor is a member of ACM and Senior Member of IEEE-CS.

Jacqueline Thomas

Jacqueline Thomas a native of the Republic of Panama, earned a degree in Psychology at Long Island University in New York. She has worked for the Mexican American Legal Defense and Educational Fund and the Latin American Association. Ms. Thomas founded the IR Group, a Marketing and PR firm targeting the growing Latino Georgia community. She returned to the non-profit arena to work on a three-year program for the Hispanic Scholarship Fund. She joined GEM as a national recruiter focusing on developing graduate level opportunities for the Latino community in engineering and science. She has presented at the National SHPE conference, SACNAS, NAMEPA and The NSBE national conference.

Dr. Juan Vargas

Dr. Juan E. Vargas obtained a BSEE from the University of Texas at El Paso (UTEP), an MS in Biomedical Engineering from the Center of Advanced Studies of the National Polytechnic Institute in Mexico, and a PhD in Biomedical Engineering from Vanderbilt University. Dr. Vargas was a professor of Computer Science & Engineering at the University of South Carolina from 1988 to 2004, where he conducted research and taught data mining, Bayesian networks, embedded and distributed systems, data structures & algorithms, programming languages, and operating systems. He maintains a strong relation with USC, where he is still a faculty member. Dr. Vargas joined Microsoft as a Sr. Academic Relations Manager from May 2004 to August 2007. He was Manager of Google’s University Relations for 2 years, and in August 2009, he joined the recently-created Microsoft Research Extreme Computing Group (MSR/XCG). Dr. Vargas has published more than 60 articles, several book chapters, and presented at many conferences.

110