Living in Data YLJ 2014.Pptx
Total Page:16
File Type:pdf, Size:1020Kb
2014-07-02 Living in Copious* Data with Vector Functional Programming And Now For Something Completely Different Dave Thomas www.davethomas.net ¡ Bedarra Research Labs ¡ GOTO/YOW! Conferences ¡ Carleton University ¡ Queensland University of Technology ¡ NITCA * Dean Wampler © 2013 Bedarra Research Labs. All rights reserved. Outline … First Primes ÿ R Œ , R ¥ TextR © R ¨ Ø 1 + 15 ê 2 3 5 7 11 13 Examples without variable ((¬◦(⊢ ∈ ,◦(⊢ ×/ ⊢))) © ⊢) ↓ 1+ 15 or in ascii J: © 2013 Bedarra Research Labs. All rights reserved. ((-.@:(] e. ,@:(] */ ]))) # ]) }. >: i. 100 1 2014-07-02 Outline 1. Motivation - Thinking in Copious Data 2. Why we decided on Vector FP? 3. Back to The Future of Array Languages 4. A Brief Taste of Vector Functional Programming 5. Yes!SQL - Just add functions and … 6. Summary © 2013 Bedarra Research Labs. All rights reserved. Motivation – Tools for Thinkers Analysis Results 2 2014-07-02 Thinker Data Sets ● Small (sensors) to Large Terabytes to Petabytes ● Semi Structured and Mixed Media ● Temporal ( Nanoseconds to Decades) and Timespans (Time Intervals) ● Messy Data (80% of data science is cleaning!) ● Missing Data (often denoted by null values) ● Out of Range Data – sensor failure, errors (denoted by +infinity and -infinity ● Uncertain Data (likely, highly unlikely) e.g. Dempster Shafer Reasoning Top Down Data Language Design Choices 1. A few elegant and simple abstractions 2. A dynamic object model and garbage collector 3. Everything is an object (list , set)... ● !! the first N < 5 implementations will suck in space and time ● !! interop with native HW will trail HW ● !! implementers will spend decades trying to make elegant => fast 4. Language is extended by libraries in the same language ● !! the libraries will often be bloated and of variable quality, with changing APIs … ● !! new libraries will interact with JIT runtime 3 2014-07-02 Bottom Up Data Language Design Choices 1. Needs to be fast ! Hence needs to be close to the metal in terms of runtime types 2. Needs to be small (compact) and data structures 3. Needs basic safety Hence must pay for nulls, index range checking... 4. Needs to support massive Hence needs to be value data versus reference based and 5. Needs scalable concurrency needs to support data parallel and simple actor concurrency 6. It will be a challenge to Hence needs an expert design language and DSLs for specific a normal developer language application users (i.e. the GPU problem) Vector FP Why Functional Vector VMs vs Object VMs Array VMs versus OVMs • Everything is boxed - No need for boxing and unboxing! … Simpler GC • Support for all native machine types • Virtual machine is smaller… can easily be held in instruction and data caches • Values are shared until modified Arrays are Column Stores => Table is a set of columns • Reduces the impedance between Objects and Records • Vectors are trivially and efficiently serialized • Vectors are machine values • Vector operations stream data through caches Array libraries use efficient algorithms code at machine level Arrays take less space than object collections Data parallelism is easily implemented 4 2014-07-02 Why Functional Vector VMs vs Object VMs • Array VMs vs. OVMs • Everything is boxed - No need for boxing and unboxing! … Simpler GC • Support for all native machine types • Virtual machine is smaller… can easily be held in instruction and data caches • Values are shared until modified • Arrays are Column Stores => Table is a set of columns • Reduces the impedance between Objects and Records • Vectors are trivially and efficiently serialized • Vectors are machine values • Vector operations stream data through caches • Array libraries use efficient algorithms code at machine level • Arrays take less space than object collections • Data parallelism is easily implemented Why Imperative Vector Functional vs Lazy FP VM • Dynamic Versus Static Types • Fixed Types reduces Abstraction Bloat (Frameworks) • Interactive exploration via Computation versus Type Refinement – Engineers and Business • Operations Defined over all types • Imperative vs Pure (Lazy) • Efficient Exploitation of imperative HW • Simpler runtime 5 2014-07-02 A Programming Language A Formal Description of SYSTEM/360, Falkoff, Adin D.; Iverson, Kenneth E.; Sussenguth, Edward H. (1964) Language as an Intellectual Tool: From Hieroglyphics to APL, Donald B. McIntyre Notation as a Tool of Thought, Ken Iverson's 1979 Turing Award Lecture The Vector (Array) System A+ APL • APL – K. Iverson • Nested Array Theory – T. Moore - APL2, Q’NIAL , A+ • Dyalog APL, APL 2000 – Unix Nial, and .NET - R, Matlab, PyNumerics, APL2 Octave.. • J - Vector FP “CISC” language • k, k2+ksql, k3/kdb, k4/kdb+... • Vector FP “RISC” language, q financial DSL for kdb+ • Julia J k 6 2014-07-02 A Programming Language (APL) – Ken Iverson1 Touch and Tweet Extremely Concise Idiomatic2,3 Programs (one liners) Life is an array Efficient generalized array operators e.g. +/ “plus reduce” ; +.* … Applicative Operator Style with Monadic and Dyadic Functions with “No Stinking Loops” 3 1. Ken Iverson, Notational as a Tool for Thought, 1979 ACM Turning Award 2. Alan Perlis, In Praise of APL – A Language for Lyrical Programming, SIAM News, 1977 and Programming with Idioms in APL, APL Conference, 1979 3. Steve Apter, nostinkingloops.com “Chasing Men That Stare at Arrays” by Catherine Lathwell Lyrical, Idiomatic, Shape Shifters, Transformers, Iron Chef, Alien Coders, Puzzlers, One Liners, Line Noise Generators, Anti-Loop Conspirators, Secret Society … © 2013 Bedarra Research Labs. All rights reserved. 7 2014-07-02 Common Features of Vector Languages • single data type is an array or vector • applicative expression oriented • execution is right to left no precedence “left of right” • applications = expressions plus a few functions • transformational expressions with assignment • most functions are monadic or dyadic • composition of functions/operations • large library of built in generalized operations • local and global scope (no closures) • application programmer is clearly a consumer not an implementer Each language has significant differences in their array model operations and library! © 2013 Bedarra Research Labs. All rights reserved. No Stinking Loops* • Operations generalized over collections.. • Each, EachLeft, EachRight, Peach maps • Conditional assignment • Generalized Amend • Reduce, Over, Scan provide bottom up loop free enablers e.g. pointer walking …; Do; While … * www.nostinkingloops.com Steve Apter 8 2014-07-02 How to learn an Array language? 1. Assume it will take weeks not hours or days? 2. Don’t write programs, write short expressions 3. Learn the nous, verb and adverbs by doing idioms (Kata of Arrays) 4. Focus on expressions as data /shape transformers 5. Write many small functions, don’t focus on smallest! 6. Start with normal control structures, move away from loops later © 2013 Bedarra Research Labs. All rights reserved. k and q Culture – Concepts and Idioms © 2013 Bedarra Research Labs. All rights reserved. 9 2014-07-02 k and q Culture – Concepts and Idioms “Arthur Whitney has designed and personally built several languages. In every case, his avowed purpose has been to make the language as concise and expressive as possible. Except for the fact that q has English words for some primitives, q is perhaps the most concise and the most powerful yet. Whereas I don't share Arthur's love for brevity, I am a power addict. The fact that this language allows programmers to mix very powerful database primitives with a powerful programming language (including inter-process communication) and to execute at high speed makes it a wonderful achievement. Arthur, however, may be most proud of the fact that the executable is only 100 kilobytes. The guy is pretty awesome.” Dennis Shasha ([email protected]) Kdb+ Database and Language Primer © 2013 Bedarra Research Labs. All rights reserved. Arthur Whitney A+, J*, k, q, Kdb A+ , an improved APL for finance applications J – Ken Inversions successor to APL k – k1, k2, k3, k4 family Kdb, Kdb+ – database with ksql and k2, k3 q – a embedded financial dsl for Kdb+ Kos – operating system in k4 * Wrote the original J interpreter 10 2014-07-02 Arthur Whitney A+, J*, k, q, Kdb Sodoku in k p:+{(=x)x}'p,,3/:_(p:,/'+:\9#'!9)%3 f:{$[&/x;,x;,/f'@[x;i;:;]'&27=x[,/p i:x?0]?!10]} Sodoku in q P:flip{(group x)x}each P,enlist 3 sv floor (P:raze each flip scan 9#'til 9)%3 F:{$[min x;enlist x;raze F each@[x;i;:;]each where 27=x[raze P i:x?0]?til 10]} A+, k1, k2, Kdb, k3, q/KDB+, k4 Text Editor in K /key(return back ^delete) kr:kx:{u,:,(j,j+#x;*k_a);e[k]x}; kb:{$[=/k;J j-1 0;];kx""};cd:{J j+!2;kx""} /edit(undo cut paste) e:{a::?[a;x;y];J(*x)+#y};cz:{$[#u;e/_`u;]}; cx:{kx cc`};cv:{kx@9'`} /save cs:{n::$[#n;n;0'""]1:a;r::a} 11 2014-07-02 q and KDB+ Syntax Nouns – variables, constants, functions . a, 42.42; (“xyz”;39;`Fred); myfunction[x;y;z]; Verbs - primitive and infix e.g a+b ; a infixfcn b; Note for readability juxtaposition any be used monadic functions; indexing; and function projections a i is short for a[i]; mymonadic `apple is short for mymonadic[`apple]; mymonadic:mydyadic 42 ; Adverbs - modify dyadic functions and verbs to new verbs. Each ('), modifies dyadic functions and verbs to apply to the items of lists instead of the lists themselves. For example, 1 2 3,4 5 / join 12345 (1 2 3;"abcd"),'(4 5;"e") / Join-Each (,') (1 2 3 4 5;"abcde") © 2013 Bedarra Research Labs. All rights reserved. Yes! SQL + Functional Vector Programming select date by stock, myavgs[3; mydouble price] from trade mydouble and myavgs are user defined functions select from t where c by g is syntactic sugar for ?[t;c;g] SQL © 2010 Bedarra Research Labs. All rights reserved.