Distribution of Maximal Repeats from Tagged Sequential Data

Distribution of Maximal Repeats from Tagged Sequential Data

2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data. 王經篤 博士 (Dr. Jing-Doo Wang) 亞洲大學 (Asia University) Chinese proverbs: 『老王』賣瓜 Is it sweet and juicy? http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592 Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works What is “Sequential Data”? • Textual Data : News, Journal Articles, etc. http://edition.cnn.com/2017/11/22/health/jfk-assassination-back-pain/index.html From:https://www.udn.com/news/story/7266/2834500 https://www.ncbi.nlm.nih.gov/pubmed/24372032 What is “Sequential Data”? • Genomic Sequences From:http://blogs.nature.com/naturejobs/2015/10/08 /big-data-the-impact-of-the-human-genome-project/ What is “Sequential Data”? • Traffic Transportation https://tptis2015.blogspot.tw/2015/07/300-brt.html https://attach.mobile01.com/640x480/attach/201312/ https://tptis2015.blogspot.tw/2017/10/blog-post.html mobile01-b004e8fd829e35140b3de0d91e847953.jpg Product Traceability **************************************** http://www.slideshare.net/5045033/ss-1002323 7 http://technews.tw/2016/04/11/tsmc-and-largan/ www.iconarchive.com It‘s a big data problem ! How to mine from these “sequential data”? http://clipart- http://clipart- library.com/clipart/kiKB8qLRT.htm library.com/clipart/6Tr5BGG7c.htm How to mine from these “sequential data”? ? From: http://globe-views.com/dcim/dreams/mine/mine-03.jpg It’s a Big Data problem! http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big-truck.jpg http://www.mining.com/wp-content/uploads/2015/06/Veladero-Mine.jpg What kind of “features” extracted from Sequential Data? • http://www.quickanddirtytips.com/sites/ default/files/images/2499/question- http://images.slideplayer.com/16/5176005/slides/slide_2.jpg mark2.jpg What kind of “Mineral” do you want (mine)? https://www.popsci.com/features/how-to-be-an-expert-in- anything/images/feature_video.jpg https://media1.britannica.com/eb-media/71/143171-049-53725C29.jpg Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works Journal of Supercomputing, April 2016 https://link.springer.com/article/10.1007/s11227-016- 1713z?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst Why use “Maximal Repeats ” as features? • Dictionary – How to identify new words or phrases? – e.g. “just do it”, “洪荒之力”。 • N-gram (K-mers) – 2-gram, 3-gram,…,5-grams. (Google Ngram viewer) – The value of “N” is limited. • Maximal Repeat – The length of maximal repeat is variable. Example: Maximal Repeat Pattern “xabcyiiizabcqabcyrxar” • ab • bc Not Maximal repeat Pattern • abc • abcy 17 Distinctive Pattern Mining(1) Classes These Classes are labeled by Domain Experts S1:******************************** S2:*********#****?***********@***** S3:********************$*********** S4:*****&*******%****************** Sequences S5:********************$*********** S6:*********#****?************@**** S7:*****&*******%****************** S8:******************************** S9:*****&*******%****************** S10:*********#****?************@**** S11:******************************** 18 [email protected] Distinctive Pattern Mining(2) Classes ******************************** ******************************** ******************************** *********#****?************@**** *********#****?************@**** *********#****?************@**** *****&*******%****************** *****&*******%****************** *****&*******%****************** ********************$*********** ********************19 $*********** [email protected] Distinctive Pattern Mining(3) Maximal Repeats #****? @**** &*******% $********** ***** Class Frequency [email protected] Applying for U.S.A. Patent From: https://www.google.com/patents/US20170255634 Patent Publication Date : Sep. 7, 2017 http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big- truck.jpg Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works Applications with Tagged Sequential Data • Analyzing Trend Analysis via Text Archaeology. • Extracting Significant Travel Time Interval from Gantry Timestamped Sequences. • Mining for Biomarker from Genomic Sequences. • Improving Quality Control via Product Traceability. From: http://www.mdpi.com/2076-3417/7/9/878 Superhighway From: http://chiangchiafeng.tian.yam.com/posts/70456997 e-Tag http://news.u-car.com.tw/article/16077 中華民國國道(高速公路)的電子收費系統 (Electronic Toll Collection,簡稱ETC) From: https://i.ytimg.com/vi/1ML2FFS2dJg/maxresdefault.jpg https://attach.mobile01.com/640x480/attach/201312/mobile01-b004e8fd829e35140b3de0d91e847953.jpg Gantry Sequences Of different Vehicle Types (VT) Gantry Timestamp Sequences with Timestamps Gantry Timestamp Sequences with TimeStamps for different Vehicles Type Significant Time Intervals of Vehicles http://www.7car.tw/articles/read/25927 https://buzzorange.com/wp- content/uploads/2015/04/640_4a486 dc48d6f1414404627e1c45f1cf9.jpg http://news.ltn.com.tw/photo/society/breakingnews/10883 61_1 05F0055N,13:33 05F0287N,13:15 05F0309N,13:13 05F0438N,13:06 05F0528N,13:00 Significant Time Intervals of Vehicles 05F0528N_13_M1_00 05F0438N_13_M1_06 05F0309N_13_M1_13 Significant Time intervals 05F0287N_13_M1_15 05F0055N_13_M1_33 ##4 ##5 ## (2016-11-15_Mon_41#1#1) (2016-11-29_Mon_41#1#1) (2016-12-09_Thu_31#1#1) Class Frequency Distribution (2016-12-20_Mon_31#1#1) Weekday vs. 24 Hours/per day Vehicle Types vs. 24 Hours/per day Significant Patterns of Travel Time Intervals of Vehicles Outline • Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works 1+5 cluster nodes 2+ 8 cluster nodes Cloud Computing Environment Artificial Intelligence Artificial Intelligence Cloud Machine Big Data Computing Learning Leverage古希臘的科學 principle (槓桿原理) (Maximal阿基米德 Repeat Extraction撐起 with 地球Class Frequency的支點 Distribution) Domain Knownledge ? Expert Relationship? ? Labels (Tags) Sequential Infrastructure Data ? (Cloud Computing) From:https://phycat.files.wordpress.com/2015/03/leverbigcorners.gif?w=810 插圖:紀玲玉 Acknowledgements (Precision Medicine) • Jeffrey J.P. Tsai ( 亞洲大學 蔡進發 校長) 計劃名稱:以生醫大數據分析為基礎的精準癌症醫療研究(2/3) 計畫編號: MOST 106-2632-E-468-002 計畫執行起迄: 106/08/01~107/07/31 Acknowledgements (Bioinformatics) • Charles C.N. Wang • Tsung-Chi Chen • Wen-Ling Chan • Rouh-Mei Hu • Jan-Gowth Chang • Yi-Chun Wang Acknowledgements (Traffic Information Analysis) • 黃銘崇 主任 • 連耀南 教授 • 潘信宏 教授 • 何承遠 教授 Acknowledgements (Big-Data: Hadoop Computing) Jazz Wang (王耀聰) Philip Lin ( 林奇暻) wei-chiu chuang (莊偉赳) • Apache Hadoop Committer/PMC member Acknowledgements • Hadoop Cluster Set Up and Consulting – SYSTEX 精誠資訊(2017) • Herb Hsu-徐啟超 – Athemaster 炬識科技股份有限公司(2018) • Ferrari • 亞洲大學 資訊發展處 黃仁德先生 『老王』賣瓜,自賣自誇 Lao Wang selling melons praises his own goods http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592 Thanks for your listening! http://www.pptschool.com/250.html www.flickr.com www.slideshare.net.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    56 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us