Coccinelle: 10 Years of Automated Evolution in the Linux Kernel

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel

Coccinelle: 10 Years of Automated Evolution in the Linux Kernel Julia Lawall (Inria/LIP6) June, 2018 1 Coccinelle Goal: Automating bug finding and evolutions for Linux kernel developers. • Development began in 2006. • Goal to automate porting of Linux 2.4 drivers to Linux 2.6. Requirements: • Accessible to Linux developers. • Reasoning about code as it appears to the developer. • Treat a large subset of C. • Ensure continuing maintainability. 2 Usage in the Linux kernel Coccinelle developers Outreachy interns Dedicated user 0-day Kernel maintainers Others 400 200 # of commits 0 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 3 How did we get here? 4 Coccinelle design: expressivity Semantic patches: Patches with some abstraction. • Remain close to the C level. • A few extensions to control the level of abstraction. @@ expression x,E1,E2; @@ - x = kmalloc(E1,E2); + x = kzalloc(E1,E2); ... - memset(x, 0, E1); 5 Coccinelle design: expressivity Semantic patches: Patches with some abstraction. • Remain close to the C level. • A few extensions to control the level of abstraction. @@ expression x,E1,E2,E3; identifier f; @@ - x = kmalloc(E1,E2); + x = kzalloc(E1,E2); ... when != (<+...x...+>) = E3 when != f(...,x,...) - memset(x, 0, E1); 6 Coccinelle design: performance Goal: Be usable on a typical developer laptop. Target code base: 5MLOC in Feb 2007, 16.5MLOC in Jan 2018. Choices: • Intraprocedural, one file at a time. • Process only .c files, by default. • Include only local or same-named headers, by default. • Use heuristics to parse macro uses. • Provide best-effort type inference, but no other program analysis. 7 Coccinelle design: correctness guarantees Ensure that outermost terms are replaced by like outermost terms @@ expression x,E1,E2,E3; identifier f; @@ - x = kmalloc(E1,E2); + x = kzalloc(E1,E2); ... - memset(x, 0, E1); No other correctness guarantees: • Bug fixes and evolutions may not be semantics preserving. • Improves efficiency and expressiveness. • Rely on developer’s knowledge of the code base and ease of creating and refining semantic patches. 8 Coccinelle design: dissemination strategy Show by example: • June 1, 2007: Fix parse errors in kernel code. • July 7, 2007: Irq function evolution – Updates in 5 files, in net, atm, and usb • July 6, 2007: kmalloc + memset −! kzalloc – Updates to 166 calls in 146 files. – A kernel developer responded “Cool!”. – Violated patch-review policy of Linux. • July 2008: Use by a non-Coccinelle developer. • October 2008: Open-source release. 9 • But some new needs emerged over time... Initial assessment • Useful: By the Coccinelle developers to contribute to the Linux kernel. • Usable: By outside developers to contribute to the Linux kernel. 10 Initial assessment • Useful: By the Coccinelle developers to contribute to the Linux kernel. • Usable: By outside developers to contribute to the Linux kernel. • But some new needs emerged over time... 10 Confrontation with the real world: • Many language evolutions: C features, metavariable types, etc. • Position variables. – Record and match position of a token. • Scripting language rules. – Original goal: bug finding, eg buffer overflows. – Used in practice for error reporting, counting, etc. Expressivity evolutions Original hypothesis: Linux kernel developers will find it easy and convenient to describe needed code changes in terms of fragments of removed and added code. 11 Expressivity evolutions Original hypothesis: Linux kernel developers will find it easy and convenient to describe needed code changes in terms of fragments of removed and added code. Confrontation with the real world: • Many language evolutions: C features, metavariable types, etc. • Position variables. – Record and match position of a token. • Scripting language rules. – Original goal: bug finding, eg buffer overflows. – Used in practice for error reporting, counting, etc. 11 Position variables and scripts @ r @ expression object; position p @@ ( drm_connector_reference@p(object) | drm_connector_unreference@p(object) ) @script:python@ object << r.object; p << r.p; @@ msg="WARNING: use get/put helpers to reference and dereference %s"%(object) coccilib.report.print_report(p[0], msg) 12 Confrontation with the real world: • 1, 5, or 15 MLOC is a lot of code. • Parsing is slow, because of backtracking heuristics. Evolutions: • Indexing, via glimpse, id-utils. • Parallelism, via parmap. Performance evolutions Original hypothesis: Limiting analysis scope via intraprocedural analysis, ignoring headers was good enough. 13 Evolutions: • Indexing, via glimpse, id-utils. • Parallelism, via parmap. Performance evolutions Original hypothesis: Limiting analysis scope via intraprocedural analysis, ignoring headers was good enough. Confrontation with the real world: • 1, 5, or 15 MLOC is a lot of code. • Parsing is slow, because of backtracking heuristics. 13 Performance evolutions Original hypothesis: Limiting analysis scope via intraprocedural analysis, ignoring headers was good enough. Confrontation with the real world: • 1, 5, or 15 MLOC is a lot of code. • Parsing is slow, because of backtracking heuristics. Evolutions: • Indexing, via glimpse, id-utils. • Parallelism, via parmap. 13 Confrontation with the real world: Mostly, developer control over rules is good enough. Correctness guarantee evolutions Original hypothesis: Developer control over rules is good enough. 14 Correctness guarantee evolutions Original hypothesis: Developer control over rules is good enough. Confrontation with the real world: Mostly, developer control over rules is good enough. 14 Confrontation with the real world: • Showing by example generated initial interest. • Organized four workshops: industry participants. • Presentations at developer conferences: FOSDEM, Linux Plumbers, etc. • LWN articles by kernel developers. Dissemination strategy evolutions Original hypothesis: Show by example rather than attempting to impose. 15 Dissemination strategy evolutions Original hypothesis: Show by example rather than attempting to impose. Confrontation with the real world: • Showing by example generated initial interest. • Organized four workshops: industry participants. • Presentations at developer conferences: FOSDEM, Linux Plumbers, etc. • LWN articles by kernel developers. 15 Status: Performance 4;000 2 cores (4 threads) 2;000 0 elapsed time (sec.) semantic patches 40;000 files considered 20;000 0 number of files semantic patches Based on the 59 semantic patches in the Linux kernel. 16 Status: Use of new features • 3325 commits contain semantic patches. • 18% use position variables. • 5% use scripts. • 43% of the semantic patches using position variables or scripts are from outside the Coccinelle team. • All 59 semantic patches in the Linux kernel use both. 17 Impact: Changed lines lines of code (log scale) 10 10 10 3 5 1 arch block crypto drivers fs Removed lines include init ipc kernel lib Added lines mm net samples security sound tools virt 18 Impact: Maintainer use 300 Cleanups Bug fixes 200 number 100 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 45% of maintainers who have at least one commit touching at least 100 files have at some point used Coccinelle. 19 Impact: Maintainer use examples TTY. Remove an unused function argument. • 11 affected files. DRM. Eliminate a redundant field in a data structure. • 54 affected files. Interrupts. Prepare to remove the irq argument from interrupt handlers, and then remove that argument. • 188 affected files. 20 Impact: Intel’s 0-day build-testing service 59 semantic patches in the Linux kernel with a dedicated make target. api free iterators locks null tests misc 400 200 0 # with patches 2013 2014 2015 2016 2017 200 100 0 2013 2014 2015 2016 2017 # with message only 21 Coccinelle community 25 contributors • Most at Inria, due to use of OCaml and PL concepts. • Active mailing list. Availability • Packaged for many Linux distros. Use outside Linux • RIOT, systemd, qemu, etc. 22 Conclusion: Lessons learned • Visibility is necessary. • Tool should be easy to access and install. • Tool should be easy to use and robust. • Interleaving pattern matching and scripts is very powerful. • Avoid creeping featurism: Do one thing and do it well. http://coccinelle.lip6.fr 23.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us