2018 Predictive Analytics Symposium

Session 18: Can You Hear Me Now: Getting Clarity with Fuzzy Logic

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

2018 SOA Predictive Analytics Symposium

Can You Hear Me Now: Getting Clarity with Fuzzy Logic September 20, 2018 The Paradox of the Heap

2 Sorites Paradox Sorites Paradox, or Paradox of the Heap: - Premise 1: A billion grains of sand constitutes a heap. - Premise 2: A heap of sand minus one grain is still a heap. - Conclusion: A grain of sand must still be a heap.

Fuzzy logic provides a way out of the paradox, by establishing a range of “heapiness” from “definitely not a heap” to “definitely a heap”.

Note: Eubulides is also credited as the first to declare that ‘today is opposite day’! 4 5 6 Our world is full of fuzzy questions.

• Is Minneapolis a big city? • Am I bald? • Is this presentation interesting? • Is air travel convenient? • Is the weather nice? • Is it late in the day?

Aren’t we just playing games with precision in language? Or does this matter? This sounds a lot like probability theory.

• From Kosko: • “The chief, but superficial similarity, is that both systems describe uncertainty with numbers in the unit interval [0,1].” • “The key distinction concerns how the system deal simultaneously with a thing A and its opposite A’.” • “Fuzziness describes event ambiguity. It measures the degree to which an event occurs, not whether it occurs. Randomness describes the uncertainty of event occurrence. An event occurs or not.”

Kosko (1989). “Fuzziness vs. Probability”. International Journal of General Systems. http://sipi.usc.edu/~kosko/Fuzziness_Vs_Probability.pdf What is Fuzzy Logic? • Reality! It is not ‘crisp’ logic. • Crisp Logic is a new name for Boolean Logic (George Boole, 1847) • Binary logic • Set membership is 0 (false, out) or 1 (true, in) • Fuzzy Logic allows interim values (Lotfi Zadeh, 1965) Set membership can be between • 0 (completely out) • and 1 (all in)

9

6-15-2015 Fuzzy Logic: Process (The Fuzzy Controlled Machine) 1. Fuzzification – convert your input and output to linguistic values (crisp -> fuzzy), utilizing ranges and membership functions.

2. Apply rules (from your experience or knowledge base) using fuzzy logic.

3. Defuzzification – convert your results to the form you want (fuzzy -> crisp).

We do this every day! Consider the decision on whether to pull out into traffic. 10

6-15-2015 Fuzzy Logic: Process

Source: http://cs.bilkent.edu.tr/~zeynep/files/short_fuzzy_logic_tutorial.pdf 11 Fuzzy Logic in R

Example Source: http://juandes.github.io/FuzzyLogic-R/ 12 library(sets) sets_options("universe", seq(1, 100, 0.5)) variables <- set( temperature = fuzzy_partition(varnames = c(cold = 30, good = 70, hot = 90), sd = 5.0), humidity = fuzzy_partition(varnames = c(dry = 30, good = 60, wet = 80), sd = 5.0), precipitation = fuzzy_partition(varnames = c(no.rain = 30, little.rain = 60, rain = 90), sd = 7.5), weather = fuzzy_partition(varnames = c(bad = 40, ok = 65, perfect = 80), FUN = fuzzy_cone, radius = 10))

13 # Fuzzy rules rules <- set( fuzzy_rule(temperature %is% good && humidity %is% dry && precipitation %is% no.rain, weather %is% perfect), fuzzy_rule(temperature %is% hot && humidity %is% wet && precipitation %is% rain, weather %is% bad), fuzzy_rule(temperature %is% cold, weather %is% bad), fuzzy_rule(temperature %is% good || humidity %is% good || precipitation %is% little.rain, weather %is% ok), fuzzy_rule(temperature %is% hot && precipitation %is% little.rain, weather %is% ok), fuzzy_rule(temperature %is% hot && humidity %is% dry && precipitation %is% little.rain, weather %is% ok) )

14 model <- fuzzy_system(variables, rules) print(model) plot(model)

15 example.1 <- fuzzy_inference(model, list(temperature = 75, humidity = 0, precipitation = 70)) gset_defuzzify(example.1,"centroid") plot(example.1)

> gset_defuzzify(example.1,"centroid") [1] 65

16 example.2 <- fuzzy_inference(model, list(temperature = 30, humidity = 0, precipitation = 70)) gset_defuzzify(example.2,"centroid") plot(example.2)

> gset_defuzzify(example.2,"centroid") [1] 49.9

17 Case Study Determining Practice Readiness

19 Indicators of Practice Readiness

• Care Quality • Panel Density • Provider Compensation • Practice Structure

20 What Fuzzy Logic Might Look Like

IF High Quality Care % AND % High Panel Density % AND % Strong Financial Incentives in Compensation % AND % Practice Structure with Strong Accountability THEN Highly Ready Practice

21 Panel Density

22 Panel Density

23 FRBS Package

• Publicly available package in R • frbs.learn has 14 different algorithms to derive rules for fuzzy logic systems • predict applies fuzzy logic system to separate data set and reports logical results • Sample Code object.frbcs.w <- frbs.learn(data.train = train_df, range.data = range.data.input, method.typ = “FRBCS.W”, control = list(num.labels = 3, type.mf = “TRAPEZOID”)) pred <- predict(object.frbcs.w, apply_subset)

24 Building Fuzzy Rule Based System

• Quality and financial outcomes results for a local network providing care to Medicare Advantage members. • Used 2014 data to train fuzzy logic system • Derive rule set to determine Risk Adjusted Cost category (low/medium/high) based on Panel Density and Quality Measure • Apply derived rule set to 2015 data and compare predicted Risk Adjusted Cost category to actual. • Significant limitation is that the data set is very small

25 Available Data

• Panel Density • PCP/Specialist Ratio • Rate of ED Utilization • Rate of Advanced Imaging Utilization • Risk Adjusted Costs

26 Analysis Results – Derived Ruleset

PCP/Spec Panel Den. ED Rate Adv Img Rt Risk Adj Cost 1 Low Low High Low High 2 High Low Low High Low 3 Low High Low High Medium 4 Medium Medium Low High Low 5 Low Low Low Medium Medium 6 Low Low Medium High High 7 Low Low Medium Low Medium 8 Medium Low Medium High Low 9 Low Medium Low Low Medium 10 Low Low Low Low Medium 11 Low Low Low Medium Low 12 Medium Medium Medium Medium Low 27 Analysis Results – Confusion Matrix

Actual Predicted Low Medium High Low 2 2 0 Medium 2 8 0 High 0 0 2

28 Bonus Material Fuzzy Matching

30 Will the real ‘Bob’ Clemente please stand up?

Name Matches B,l,l,n (4) B,o,b,n,n,e (6) O,b,e,t,o,c,l,e,m,e,n,t,e (13) Roman Meijas R,o,m,n,M,e (6) O,e,C,t,e (5) Harry Bright B,t (2) Dick Schofield C,c,o,e (4) Harry Simpson M,o,n (3) O,c,N,e,o,n (6) c,t (2)

31 R package ‘stringdist’ can produce a variety of fuzzy text string metrics.

> stringdist(c("Bob Clemente"),c("Bill Virdon","Bob Skinner","Roberto Clemente", "Ed Clemente")) [1] 10 7 5 3 Default method – ‘optimal string alignment’

> stringdist(c("Bob Clemente"),c("Bill Virdon","Bob Skinner","Roberto Clemente", "Ed Clemente"), method='qgram') [1] 13 11 6 5 ‘Q-Gram’ still confused by Ed

> stringdist(c("Bob Clemente"),c("Bill Virdon","Bob Skinner","Roberto Clemente", "Ed Clemente"), method='cosine') [1] 0.63485163 0.41165159 0.08333333 0.14250707

Cosine distance between Q-gram profiles finally gets it

More info: https://cran.r-project.org/web/packages/stringdist/stringdist.pdf

32 33 https://www.joyofdata.de/blog/comparison-of-string-distance-algorithms/ And not just names, of course! - Addresses (123 S. Main St. vs 123 South Main Street) - City / County Name (St. vs Saint) - Spellcheck / Autocomplete