On Snakes and Elephants Using Python Inside Postgresql

On Snakes and Elephants Using Python Inside Postgresql

On snakes and elephants Using Python inside PostgreSQL Jan Urba´nski [email protected] New Relic PyWaw Summit 2015, Warsaw, May 26 Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 1 / 32 For those following at home Getting the slides $ wget http://wulczer.org/pywaw-summit.pdf Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 2 / 32 1 Introduction Stored procedures PostgreSQL's specifics 2 The PL/Python language Implementation Examples 3 Using PL/Python Real-life applications Best practices Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 3 / 32 Outline 1 Introduction Stored procedures PostgreSQL's specifics 2 The PL/Python language 3 Using PL/Python Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 4 / 32 What are stored procedures I procedural code callable from SQL I used to implement operations that are not easily expressed in SQL I encapsulate business logic Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 5 / 32 Stored procedure examples Calling stored procedures SELECT purge_user_records(142); SELECT lower(username) FROM users; CREATE TRIGGER notify_user_trig AFTER UPDATEON users EXECUTE PROCEDURE notify_user(); Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 6 / 32 Stored procedure languages I most RDBMS have one blessed language in which stored procedures can we written I Oracle has PL/SQL I MS SQL Server has T-SQL I but Postgres is better Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 7 / 32 Stored procedures in Postgres I a stored procedure in Postgres is I information about input and output types I metadata like the name, the owner, additional modifiers I finally, a bunch of text I a procedural language in Postgres is just a C extension module exposing a single function I its job is to execute that piece of text, accepting and producing the perscribed types Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 8 / 32 Outline 1 Introduction Stored procedures PostgreSQL's specifics 2 The PL/Python language 3 Using PL/Python Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 9 / 32 Extensibility is king I stored procedures in Postgres can be written in any language... I ... as long as a handler has been defined for it I several languages are officially supported I PL/pgSQL, a PL/SQL look-alike I PL/Tcp, PL/Perl and PL/Python I and there's a lot of unofficial ones I Ruby, Lua, PHP, Java, Scheme, V8, R... Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 10 / 32 Trusted vs untrusted languages I once installed, trusted languages are available to all users I for example, PL/pgSQL or PL/V8 I they need to provide a sandboxed execution environment for arbitrary user code I the ability to create untrusted language functions is limited to database superusers Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 11 / 32 Outline 1 Introduction 2 The PL/Python language Implementation Examples 3 Using PL/Python Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 12 / 32 I but that's the fun of it! What PL/Python actually is I the ability to run a Python interpreter inside the backend I runs as the backend's OS user, so untrusted I can run arbitrary Python code, including doing very nasty or really crazy things Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 13 / 32 What PL/Python actually is I the ability to run a Python interpreter inside the backend I runs as the backend's OS user, so untrusted I can run arbitrary Python code, including doing very nasty or really crazy things I but that's the fun of it! Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 13 / 32 How does it work I the first time a PL/Python function is run, a Python interpreter is initialised inside the backend process I preload plpython.so to avoid initial slowdown I use long-lived connections to only pay the overhead once I Postgres types are transformed into Python types and vice versa I only works for built-in types, the rest gets passed using the string representation Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 14 / 32 How does it work cd. I function arguments are visible as global variables I the function has access to various magic globals that describe the execution environment I the plpy module providing database access and utility functions I a dictionary with the old and new tuples if called as a trigger I dictionaries kept in memory between queries, useful for caches I the module path depends on the server process's PYTHONPATH Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 15 / 32 Outline 1 Introduction 2 The PL/Python language Implementation Examples 3 Using PL/Python Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 16 / 32 PL/Python examples Using Python modules create function histogram(a float[], bins int= 10) returns int[] as $$ import numpy return numpy.histogram(a, bins)[0] $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 17 / 32 PL/Python examples Using Python modules cd. create function get_host(url text) returns text as $$ import urlparse return urlparse.urlparse(url).netloc $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 18 / 32 PL/Python utility functions Utility functions create function find_table(name text) returns text[] as $$ import difflib sql='select tablename from pg_tables' result= plpy.execute(sql) all_names=[table['tablename'] for table in result] return difflib.get_close_matches(name, all_names) $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 19 / 32 PL/Python utility functions Utility functions cd. create function add_unique_user(name text, email text) returns text as $$ lname, lemail= name, email plan= plpy.prepare( 'insert into users(name, email) values ($1, $2)', ('text','text')) while True: try: plpy.execute(plan,(lname, lemail)) except plpy.spiexceptions.UniqueViolation: lname= lname+'_' else: return lname $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 20 / 32 PL/Python global dictionaries Global dictionaries create function get_mx(domain text) returns text as $$ import DNS, time mx, expires=GD.get(domain,(None,0)) if mx and time.time()< expires: return mx GD[domain]= DNS.mxlookup(domain)[0][1], time.time()+5 returnGD[domain][0] $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 21 / 32 PL/Python advanced examples Avanced examples create function check_mx() returns trigger as $$ import DNS domain=TD['new']['email'].split('@',1)[1] try: DNS.mxlookup(domain) or plpy.error('no mx') except DNS.ServerError: plpy.error('lookup failed for domain %s'% domain) $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 22 / 32 PL/Python advanced examples Avanced examples cd. create function schedule(source text, summary out text, location out text, start out timestamptz) returns setof record as $$ import icalendar, requests resp= requests.get(source) cal= icalendar.Calendar.from_ical(resp.content) for event in cal.walk('VEVENT'): yield(event['SUMMARY'], event['LOCATION'], event['DTSTART'].dt.isoformat()) $$ language plpythonu; Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 23 / 32 Outline 1 Introduction 2 The PL/Python language 3 Using PL/Python Real-life applications Best practices Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 24 / 32 Where to use PL/Python I writing a particular piece of logic in a nicer language than PL/pgSQL I doing numerical computations in the database with NumPy I doing text analysis with NLTK I writing a constraint that checks if a column contains JSON I or a protobuf stream I or a PNG image Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 25 / 32 Crazier ideas I implementing a simple cache layer right in the database I connecting to other Postgres instances and doing things to them I communicating with external services to invalidate caches or trigger actions Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 26 / 32 Multicorn I not really PL/Python, but similar idea under the hood I uses the foreign data wrapper mechanism I foreign data wrappers are a way to present data residing in other storages as if they were local tables I can use arbitrary Python code to provide unified access to disparate sources Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 27 / 32 Multicorn example Filesystem access create foreign table python_fs( package text, module text, content bytea) server filesystem_srv options( root_dir'/usr/lib/python2.7/dist-packages', pattern'{package}/{module}.py', content_column'content'); Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 28 / 32 Multicorn modules I out of the box, Multicord provides modules for filesystem, IMAP, LDAP, SQLAlchemy and a few more I but it's easy to write your own! I perfect for prototyping production-grade foreign data wrappers Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 29 / 32 Outline 1 Introduction 2 The PL/Python language 3 Using PL/Python Real-life applications Best practices Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 30 / 32 Organising PL/Python code I keep your PL/Python code in a module I make all your SQL functions two-liners I test the Python code by mocking out magic variables I it's a sharp tool, be careful Jan Urba´nski (New Relic) On snakes and elephants PyWaw Summit 31 / 32 Organising PL/Python code

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    35 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us