2019/7/19. 23 AI Summer Program : Hadoop Map&Reduce Programming for Big Traffic Data Management Applications using the Class Frequency Distribution of Maximal Repeats from Tagged Sequential Data. 王經篤 博士 (Dr. Jing-Doo Wang) 亞洲大學 (Asia University) Chinese proverbs: 『老王』賣瓜

Is it sweet and juicy?

http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592 Outline

• Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works What is “Sequential Data”?

• Textual Data : News, Journal Articles, etc.

http://edition.cnn.com/2017/11/22/health/jfk-assassination-back-pain/index.html

From:https://www.udn.com/news/story/7266/2834500 https://www.ncbi.nlm.nih.gov/pubmed/24372032 What is “Sequential Data”?

• Genomic Sequences

From:http://blogs.nature.com/naturejobs/2015/10/08 /big-data-the-impact-of-the-human-genome-project/ What is “Sequential Data”?

• Traffic Transportation

https://tptis2015.blogspot.tw/2015/07/300-brt.html

https://attach.mobile01.com/640x480/attach/201312/ https://tptis2015.blogspot.tw/2017/10/blog-post.html mobile01-b004e8fd829e35140b3de0d91e847953.jpg Product Traceability

****************************************

http://www.slideshare.net/5045033/ss-1002323 7 http://technews.tw/2016/04/11/tsmc-and-largan/ www.iconarchive.com It‘s a big data problem ! How to mine from these “sequential data”?

http://clipart- http://clipart- library.com/clipart/kiKB8qLRT.htm library.com/clipart/6Tr5BGG7c.htm How to mine from these “sequential data”?

?

From: http://globe-views.com/dcim/dreams/mine/mine-03.jpg It’s a Big Data problem!

http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big-truck.jpg

http://www.mining.com/wp-content/uploads/2015/06/Veladero-Mine.jpg What kind of “features” extracted from Sequential Data? •

http://www.quickanddirtytips.com/sites/ default/files/images/2499/question- http://images.slideplayer.com/16/5176005/slides/slide_2.jpg mark2.jpg What kind of “Mineral” do you want (mine)?

https://www.popsci.com/features/how-to-be-an-expert-in- anything/images/feature_video.jpg

https://media1.britannica.com/eb-media/71/143171-049-53725C29.jpg Outline

• Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works Journal of Supercomputing, April 2016

https://link.springer.com/article/10.1007/s11227-016- 1713z?wt_mc=internal.event.1.SEM.ArticleAuthorOnlineFirst Why use “Maximal Repeats ” as features? • Dictionary – How to identify new words or phrases? – e.g. “just do it”, “洪荒之力”。 • N-gram (K-mers) – 2-gram, 3-gram,…,5-grams. ( Ngram viewer) – The value of “N” is limited. • Maximal Repeat – The length of maximal repeat is variable. Example: Maximal Repeat Pattern

“xabcyiiizabcqabcyrxar”

• ab • bc Not Maximal repeat Pattern • abc • abcy

17 Distinctive Pattern Mining(1)

Classes These Classes are labeled by Domain Experts S1:******************************** S2:*********#****?***********@***** S3:********************$*********** S4:*****&*******%****************** Sequences S5:********************$*********** S6:*********#****?************@**** S7:*****&*******%****************** S8:******************************** S9:*****&*******%****************** S10:*********#****?************@**** S11:******************************** 18 [email protected] Distinctive Pattern Mining(2) Classes

******************************** ******************************** ******************************** *********#****?************@**** *********#****?************@**** *********#****?************@**** *****&*******%****************** *****&*******%****************** *****&*******%****************** ********************$***********

********************19 $*********** [email protected] Distinctive Pattern Mining(3) Maximal Repeats #****? @**** &*******% $********** ***** Class Frequency [email protected] Applying for U.S.A. Patent

From: https://www.google.com/patents/US20170255634 Patent Publication Date : Sep. 7, 2017

http://haphazardstuffblog.com/wp- content/uploads/2012/01/Big- truck.jpg Outline

• Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works Applications with Tagged Sequential Data

• Analyzing Trend Analysis via Text Archaeology. • Extracting Significant Travel Time Interval from Gantry Timestamped Sequences. • Mining for Biomarker from Genomic Sequences. • Improving Quality Control via Product Traceability. From: http://www.mdpi.com/2076-3417/7/9/878 Superhighway

From: http://chiangchiafeng.tian.yam.com/posts/70456997 e-Tag

http://news.u-car.com.tw/article/16077 中華民國國道(高速公路)的電子收費系統 (Electronic Toll Collection,簡稱ETC)

From: https://i.ytimg.com/vi/1ML2FFS2dJg/maxresdefault.jpg https://attach.mobile01.com/640x480/attach/201312/mobile01-b004e8fd829e35140b3de0d91e847953.jpg Gantry Sequences Of different Vehicle Types (VT) Gantry Timestamp Sequences with Timestamps Gantry Timestamp Sequences with TimeStamps for different Vehicles Type Significant Time Intervals of Vehicles http://www.7car.tw/articles/read/25927 https://buzzorange.com/wp- content/uploads/2015/04/640_4a486 dc48d6f1414404627e1c45f1cf9.jpg

http://news.ltn.com.tw/photo/society/breakingnews/10883 61_1 05F0055N,13:33

05F0287N,13:15

05F0309N,13:13 05F0438N,13:06 05F0528N,13:00

Significant Time Intervals of Vehicles

05F0528N_13_M1_00 05F0438N_13_M1_06 05F0309N_13_M1_13 Significant Time intervals 05F0287N_13_M1_15 05F0055N_13_M1_33 ##4 ##5 ## (2016-11-15_Mon_41#1#1) (2016-11-29_Mon_41#1#1) (2016-12-09_Thu_31#1#1) Class Frequency Distribution (2016-12-20_Mon_31#1#1) Weekday vs. 24 Hours/per day Vehicle Types vs. 24 Hours/per day Significant Patterns of Travel Time Intervals of Vehicles Outline

• Introduction – What is “Sequential Data”? – A scalable approach of Maximal Repeat Extraction • Applications with Tagged Sequential Data – Analyzing Text Archaeology. – Extracting Significant Travel Time Interval – Mining for Biomarker. – Improving Quality Control. • Future Works 1+5 cluster nodes

2+ 8 cluster nodes Cloud Computing Environment Artificial Intelligence

Artificial Intelligence

Cloud Machine Big Data Computing Learning Leverage古希臘的科學 principle (槓桿原理) (Maximal阿基米德 Repeat Extraction撐起 with 地球Class Frequency的支點 Distribution)

Domain Knownledge ? Expert Relationship? ? Labels (Tags)

Sequential Infrastructure Data ? (Cloud Computing)

From:https://phycat.files.wordpress.com/2015/03/leverbigcorners.gif?w=810 插圖:紀玲玉 Acknowledgements (Precision Medicine) • Jeffrey J.P. Tsai ( 亞洲大學 蔡進發 校長)

計劃名稱:以生醫大數據分析為基礎的精準癌症醫療研究(2/3) 計畫編號: MOST 106-2632-E-468-002 計畫執行起迄: 106/08/01~107/07/31 Acknowledgements (Bioinformatics) • Charles C.N. Wang • Tsung-Chi Chen

• Wen-Ling Chan • Rouh-Mei Hu

• Jan-Gowth Chang • Yi-Chun Wang Acknowledgements (Traffic Information Analysis) • 黃銘崇 主任

• 連耀南 教授

• 潘信宏 教授

• 何承遠 教授 Acknowledgements (Big-Data: Hadoop Computing) Jazz Wang (王耀聰)

Philip Lin ( 林奇暻) wei-chiu chuang (莊偉赳) • Apache Hadoop Committer/PMC member Acknowledgements • Hadoop Cluster Set Up and Consulting – SYSTEX 精誠資訊(2017) • Herb Hsu-徐啟超 – Athemaster 炬識科技股份有限公司(2018) • Ferrari

• 亞洲大學 資訊發展處 黃仁德先生 『老王』賣瓜,自賣自誇

Lao Wang selling melons praises his own goods

http://www.9ht.com/xue/44228.html http://www.pxmart.com.tw/px/ingredients.px?id=2592 Thanks for your listening!

http://www.pptschool.com/250.html

www.flickr.com www.slideshare.net