
Amazing SQL your ORM can (or can’t) do Louise Grandjonc - pgconfEU 2019 @louisemeta About me Software Engineer at Citus Data / Microsoft Python developper Postgres enthusiast @louisemeta and @citusdata on twitter www.louisemeta.com [email protected] @louisemeta Why this talk • I am a developper working with other developers • I write code for applications using postgres • I love both python and postgres • I use ORMs • I often attend postgres conferences • A subject that I always enjoy is complex and modern SQL • I want to use my database at its full potential @louisemeta Today’s agenda 1. Basic SQL reminder 2. Aggregate functions 3. GROUP BY 4. EXISTS 5. Subquery 6. LATERAL JOIN 7.Window Functions 8. GROUPING SETS / ROLLUP / CUBE 9. CTE (Common Table Expression) @louisemeta Data model @louisemeta Dataset • 10 artists: Kyo, Blink-182, Maroon5, Jonas Brothers, Justin Timberlake, Avril Lavigne, Backstreet Boys, Britney Spears, Justin Bieber, Selena Gomez • 72 albums • 1012 songs • 120,029 words: transformed the lyrics into tsvector and each vector (> 3 letters) went into a kyo_word row with its position. @louisemeta ORMs we’ll talk about today • Django ORM (python) • SQLAlchemy (python) • Activerecord (ruby) • Sequel (ruby) @louisemeta A quick tour on basic SQL SELECT columns FROM table_a (INNER, LEFT, RIGHT, OUTER) JOIN table_a ON table_a.x = table_b.y WHERE filters ORDER BY columns LIMIT X @louisemeta A quick tour on basic SQL Django ORM (python) Artist.objects.get(pk=10) Sqlalchemy (python) session.query(Artist).get(10) SELECT id, name FROM kyo_artist WHERE id=10; Activerecord (ruby) Artist.find(10) Sequel (ruby) Artist[10] @louisemeta A quick tour on basic SQL Django ORM (python) Sqlalchemy (python) Album.objects session.query(Album) .filter(year_gt=2009) .filter(Album.year > 2009) .order_by(‘year’) .order_by(Album.year) Activerecord (ruby) Sequel (ruby) Album.where('year > ?', 2009) Album.where(Sequel[:year] > 2009) .order(:year) .order(:year) SELECT id, name, year, popularity FROM kyo_album WHERE year > 2009 ORDER BY year; @louisemeta A quick tour on basic SQL Django ORM (python) Sqlalchemy (python) Album.objects session.query(Album) .filter(year_gt=2009) .join(‘artist') .order_by(‘year’) .filter(Album.year > 2009) .select_related(‘artist’) .order_by(Album.year) Activerecord (ruby) Sequel (ruby) Album.where('year > ?', 2009) Album.where(Sequel[:year] > 2009) .joins(:artist) .join(:artist) .order(:year) .order(:year) SELECT id, name, year, popularity FROM kyo_album INNER JOIN kyo_artist ON kyo_artist.id=kyo_album.artist_id WHERE year > 2009 ORDER BY year; @louisemeta A quick tour on basic SQL Django ORM (python) Album.objects .filter(artist_id=12)[5:10] Sqlalchemy (python) session.query(Album) .filter(Album.artist_id == 12) SELECT id, name, year, popularity .offset(5).limit(10) FROM kyo_album WHERE artist_id = 12 Activerecord (ruby) ORDER BY id OFFSET 5 LIMIT 10; Album.where(artist_id: 12) .limit(10).offset(5) To go further in pagination: Sequel (ruby) https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/ Album.where(artist_id: 12) .limit(10).offset(5) @louisemeta Executing RAW SQL queries Django Word.objects.raw(query, args) with connection.cursor() as cursor: cursor.execute(query) data = cursor.fetchall(cursor) @louisemeta Executing RAW SQL queries SQLAlchemy engine = create_engine(‘postgres://localhost:5432/kyo_game’) with engine.connect() as con: rs = con.execute(query, **{‘param1’: ‘value’}) rows = rs.fetchall() @louisemeta Executing RAW SQL queries Activerecord words = Word.find_by_sql ['SELECT * FROM words WHERE artist_id=:artist_id', {:artist_id => 14}] rows = ActiveRecord::Base.connection.execute(sql, params) The functions select/where/group also can take raw SQL. @louisemeta Executing RAW SQL queries Sequel DB = Sequel.connect(‘postgres://localhost:5432/kyo_game_ruby') DB['select * from albums where name = ?', name] @louisemeta Average popularity of Maroon5’s albums SELECT AVG(popularity) FROM kyo_album WHERE artist_id=9; # Django popularity = Album.objects.filter(artist_id=9).aggregate(value=Avg(‘popularity’) # sqlalchemy session.query(func.avg(Album.popularity).label(‘average')) .filter(Album.artist_id == 9) {'value': 68.16666666667} @louisemeta Average popularity of Maroon5’s albums Ruby - ActiveRecord/Sequel SELECT AVG(popularity) FROM kyo_album WHERE artist_id=9; #Activerecord Album.where(artist_id: 9).average(:popularity) # Sequel Album.where(artist_id: 9).avg(:popularity) @louisemeta Words most used by Justin Timberlake with the number of songs he used them in Word Number of occurrences Number of song love 503 56 know 432 82 like 415 68 girl 352 58 babi 277 59 come 227 58 caus 225 62 right 224 34 yeah 221 54 @louisemeta Words most used by Justin Timberlake SELECT COUNT(id) AS total, FROM kyo_word 17556 WHERE artist_id = 11; Word Number of love occurrences503 SELECT value, know 432 COUNT(id) AS total like 415 FROM kyo_word girl 352 WHERE artist_id = 11 babi 277 GROUP BY value ORDER BY total DESC come 227 LIMIT 10 caus 225 right 224 yeah 221 @louisemeta Words most used by Justin Timberlake Django SELECT value, COUNT(id) AS total, COUNT(DISTINCT song_id) AS total_songs FROM kyo_word WHERE artist_id = 11 GROUP BY value ORDER BY total DESC LIMIT 10 Word.objects.filter(artist_id=self.object.pk) .values('value') .annotate(total=Count(‘id'), total_song=Count('song_id', distinct=True)) .order_by('-total')[:10]) @louisemeta Words most used by Justin Timberlake SQLAlchemy SELECT value, COUNT(id) AS total, COUNT(DISTINCT song_id) AS total_songs FROM kyo_word WHERE artist_id = 11 GROUP BY value ORDER BY total DESC LIMIT 10 session.query( Word.value, func.count(Word.value).label(‘total’), func.count(distinct(Word.song_id))) .group_by(Word.value) .order_by(desc('total')).limit(10).all() @louisemeta Words most used by Justin Timberlake Activerecord Word.where(artist_id: 11) .group(:value) .count(:id) Word.where(artist_id: 11) .group(:value) .select('count(distinct song_id), count(id)’) .order('count(id) DESC’) .limit(10) @louisemeta Words most used by Justin Timberlake Sequel Word.where(artist_id: 11) .group_and_count(:value) Word.group_and_count(:value) .select_append{count(distinct song_id).as(total)} .where(artist_id: 11) .order(Sequel.desc(:count)) @louisemeta To go further with aggregate functions AVG COUNT (DISTINCT) Min Max SUM @louisemeta Support with ORMs Activerecord Django SQLAlchemy Sequel (*) AVG Yes Yes Yes Yes COUNT Yes Yes Yes Yes Min Yes Yes Yes Yes Max Yes Yes Yes Yes Sum Yes Yes Yes Yes * Activerecord: to cumulate operators, you will need to use select() with raw SQL @louisemeta Words that Avril Lavigne only used in the song “Complicated” Word dress drivin foolin pose preppi somethin strike unannounc @louisemeta Words that Avril Lavigne only used in the song “Complicated” - Django SELECT * FROM kyo_word word WHERE song_id=342 Filter the result if the AND NOT EXISTS ( subquery returns no row SELECT 1 FROM kyo_word word2 WHERE word2.artist_id=word.artist_id Subquery for the same word2.song_id <> word.song_id value for a word, but AND word2.value=word.value); different primary key same_word_artist = (Word.objects .filter(value=OuterRef(‘value’), artist=OuterRef('artist')) .exclude(song_id=OuterRef(‘song_id’)) context['unique_words'] = Word.objects.annotate(is_unique=~Exists(same_word_artist)) .filter(is_unique=True, song=self.object) @louisemeta Words that Avril Lavigne only used in the song “Complicated” - SQLAlchemy word1 = Word word2 = aliased(Word) subquery = session.query(word2).filter(value == word1.value, artist_id == word1.artist_id, song_id != word1.song_id) session.query(word1).filter(word1.song_id == 342, ~subquery.exists()) @louisemeta Words that Avril Lavigne only used in the song “Complicated” - Activerecord There is an exists method: Album.where(name: ‘Kyo').exists? But in a subquery, we join the same table and they can’t have an alias Word.where(song_id: 342).where( 'NOT EXISTS (SELECT 1 FROM words word2 WHERE word2.artist_id=words.artist_id AND word2.song_id <> words.song_id AND word2.value=words.value)’ ) @louisemeta An example where EXISTS performs better I wanted to filter the songs that had no value in the table kyo_word yet. A basic version could be Song.objects.filter(words__isnull=True) SELECT "kyo_song"."id", "kyo_song"."name", "kyo_song"."album_id", "kyo_song"."language" FROM "kyo_song" LEFT OUTER JOIN "kyo_word" ON ("kyo_song"."id" = "kyo_word"."song_id") WHERE "kyo_word"."id" IS NULL 17-25ms @louisemeta An example where EXISTS performs better And with an EXISTS Song.objects.annotate(processed=Exists(Word.objects.filter(song_id=OuterRef('pk')))) .filter(processed=False) SELECT * , EXISTS(SELECT * FROM "kyo_word" U0 WHERE U0."song_id" = ("kyo_song"."id")) AS "processed" FROM "kyo_song" WHERE EXISTS(SELECT * FROM "kyo_word" U0 WHERE U0."song_id" = ("kyo_song"."id")) = False 4-6ms @louisemeta Detecting the chorus of the song “All the small things” - Blink 182 To make it simple, we want to know what group of two words is repeated more than twice. Word Next word Occurences turn light 4 “Say it ain’t so I will not go light carri 4 Turn the lights off Carry me home” carri home 4 @louisemeta Detecting the chorus of a song Subquery Step 1: Getting the words with their next word SELECT value, ( SELECT U0.value FROM kyo_word U0 Subquery WHERE (U0.position > (kyo_word.position) AND U0.song_id = 441) ORDER BY U0.position ASC LIMIT 1 ) AS "next_word", FROM "kyo_word" WHERE "kyo_word"."song_id" = 441 @louisemeta Detecting the chorus of a song Subquery Step 2: Getting
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages72 Page
-
File Size-