Image, Speech and Natural Language Processing Approach for Building a Social Networking for the Visually Impaired
Total Page:16
File Type:pdf, Size:1020Kb
id2196634 pdfMachine by Broadgun Software - a great PDF writer! - a great PDF creator! - http://www.pdfmachine.com http://www.broadgun.com International Journal of Latest Research in Science and Technology ISSN (Online):2278-5299 Volume 2, Issue 5: Page No.52-57,September-October 2013 https://www.mnkpublication.com/journal/ijlrst/index.php IMAGE, SPEECH AND NATURAL LANGUAGE PROCESSING APPROACH FOR BUILDING A SOCIAL NETWORKING FOR THE VISUALLY IMPAIRED 1N.Vignesh,2S.Sowmya 1Research Associate, IIM Ahmedabad 2SDE, ACS Oracle India Abstract- A social network is a social structure made of nodes (which are generally individuals or organizations) that are tied by one or more specific types of interdependency, such as values, visions, ideas, financial exchange, friendship, kinship, dislike, conflict or trade. The resulting graph-based structures are often very complex. Social network analysis views social relationships in terms of nodes and ties. Nodes are the individual actors within the networks, and ties are the relationships between the actors. There can be many kinds of ties between the nodes. Research in a number of academic fields has shown that social networks operate on many levels, from families up to the level of nations, and play a critical role in determining the way problems are solved, organizations are run, and the degree to which individuals succeed in achieving their goals. Our system puts forth integrated software that also allows multimedially cryptic things to be converted orally to give a better definition of social networking to the visually impaired. – Keywords Multimedia ,Orally, Visually Impaired 1. INTRODUCTION Presently, social networking has become part and parcel of AND-OR tree to provide a more realistic view for the same. ’ everyone s life. The advent of visual multimedia applications So we made an effort to integrate an I2T and T2S software to has added value to the same. In the present day there are a social networking base to help the visually impaired enjoy many social networking sites targeted purposefully for the pleasure of social networking. specific age groups. The interesting thing is that, yet there is The paper is organized as follows: Section 2 deals with the not a single networking site which not even allows visually related works (prior arts) and a comparison of our system impaired to get hands on experience in the social networking ’ with existing works. Section 3 describes the working process world. Although there are application software s available to of the proposed system in the design and implementation overcome the above mentioned problem, scarcely such aspect. Section 4 concludes with future works. efforts were made in integrating them to a social networking platform. Currently the most used application software that 1. PRIOR ARTS AND PROPOSED MODEL allows visually impaired to enter social networking site turns 1.1 JAWS(Job Access With Speech) out to be JAWS. The problem associated with JAWS is that it is capable of providing any information which is textually JAWS (Job Access With Speech) is a computer screen cryptic vocally. This merely turns out to be an old school reader program in Microsoft windows that allows blind and screen reading program which never suffices the present day visually impaired users to read the screen either with text to social networking. So we thought that integrating software speech output or by a refreshable Braille display. JAWS were that also allows multimedially cryptic things to be converted originally created for the MS-DOS operating system. It was orally gives a better definition of social networking to the one of several screen readers giving blind users access to visually impaired. But parsing an image to various sub- text-mode MS-DOS applications. A feature unique to JAWS components (using AND-OR trees) and then give it to a text at the time was its use of cascading menus, in the style of the ’ to speech converter doesn t really make sense because the popular Lotus 1-2-3 application. What set JAWS apart from text generated is grammatically insufficient to explain itself. other screen readers of the era was its use of macros that So we made use of I2T converter designed by Benjamin Z. allowed users to customize the user interface and work better Yao et al. as an effort to parse images to text components with various applications. Ted Henter and Rex Skipper wrote which provides a reasonably better solution to the problem .It the original JAWS code in the mid-1980s, releasing version uses text planner and text analyzer to provide grammatically 2.0 in mid-1990. Skipper left the company after the release of almost-correct definition for a parsed picture. The working of version 2.0, and following his departure, Charles Oppermann this I2T converter is very much related to our problem since was hired to maintain and improve the product. Oppermann it also uses the AND-OR tree hierarchy for parsing images and Henter regularly added minor and major features and into text and the aforementioned text enhancements. Our frequently released new versions. Freedom Scientific now problem also includes importing non lexical fillers to the offers JAWS for MS-DOS as a freeware download from their Publication History IMSaSnuNsc:r2ip2t 7R8ec-e5iv2ed9 9 : 1 8 O c t o b e r 2 0 1 3 52 Manuscript Accepted : 25 October 2013 Revision Received : 28 October 2013 Manuscript Published : 31 October 2013 International Journal of Latest Research in Science and Technology. web site. In 1993, Henter-Joyce released a highly-modified An And-or Graph (AoG) visual knowledge version of JAWS for people with learning disabilities. This representation that embodies vocabularies of visual product, called Word Scholar, is no longer available. elements including primitives, parts, objects and scenes as well as a stochastic image grammar that specifies syntactic 1.1.1 INABILITIES OF JAWS : (compositional) relations and semantic relations (e.g. Although JAWS was able to convince people that visually Categorical, spatial, temporal and functional relations) ’ impaired s can access computers in the same way as their between these visual elements. The categorical relationships visually able counter parts. But with the advent of multimedia are inherited from WorldNet, a lexical semantic network of applications in social networking sites the inability of JAWS English. The AoG not only guides the image parsing engine ’ was exposed as it can t parse images to produce vocal with top-down hypotheses but also serves as ontology for descriptions about the same. Not to disrespect JAWS, it mapping parse graphs into semantic representation. ’ enabled about 867 visually impaired s to enjoy the pleasure A Semantic Web that interconnects different domain of social networking (Facebook). specific ontology with semantic representation of parse 1.1.2 OUR ADVANCEMENT OVER JAWS: graphs. This step helps to enrich parse graphs derived purely from visual cues with other sources of semantic information. We thought that by adding image parsing capabilities to an “ ’ For example, the input picture has a text tag Oven s mouth interface, we would take the level of social networking to ” river . With the help of a GIS database embedded in the another high. Our project mainly aims at integrating an Semantic Web, we are able to relate this picture to a geo- interface that would in turn take care of the overheads of “ ’ ” location: Oven s mouth preserve of Maine State . Another converting text to speech and vice-versa and parse images benefit of using Semantic Web technology is that end users and deliver it in text formats. Not to mention, it would also not only can access the semantic information of an image by eliminate the conventional brailley key logger method with a reading the natural language text report but can also query the fully fledged voice interface. Semantic Web using standard semantic querying languages. 1.2 IMAGE TO TEXT PARSER(I2T) A text generation engine that converts semantic 1.2.1 OVERVIEW representations into human readable and query-able natural language descriptions. As simple as the I2T task may seem to Fast growth of public photo and video sharing websites, “ ” “ ” be for a human, it is by no means an easy task for any such as Flickr and YouTube , provides a huge corpus of computer vision system today - especially when input images unstructured image and video data over the Internet. are of great diversity in contents (i.e. number and category of Searching and retrieving visual information from the Web, objects) and structures (i.e. spatial layout of objects), which is however, has been mostly limited to the use of meta-data, certainly the case for images from the Internet. But given user-annotated tags, captions and surrounding text (e.g. the certain controlled domains, for some cases we may use image search engine used by Google). In this paper, we automatic image parsing is practical. For this reason, our present an image parsing to text description (I2T) framework objective in this paper is twofold: (a) we use a semi- that generates text descriptions in natural language based on automatic method (interactive) to parse general images from understanding of image and video content. Fig. 1 illustrates the Internet in order to build a large-scale ground truth image two major tasks of this framework, namely image parsing and dataset. Then we learn the AoG from this dataset for visual text description. By analogy to natural language knowledge representation. Our goal is to make the parsing understanding, image parsing computes a parse graph of the process more and more automatic using the learned AoG most probable interpretation of an input image. This parse models. The camera is static, so we only need to parse the graph includes tree structured decomposition for the contents background (interactively) once at the beginning, and all of the scene, from scene labels, to objects, to parts and other components are done automatically.