Data Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
Data Retrieval ttp://www.billboard.com/charts/hot-100/2016-09-17 http://www.billboard.com/fe-ajax/spotify/preview/chart-tracks/HSI/2016-09-17/US 2 ��� hot100Data.nb getRankingxml_, maxRank_:= Cases xml, XMLElement"div","class"→"chart-row__main-display", ___, XMLElement"div","class"→"chart-row__rank", ___, XMLElement"span","class"→"chart-row__current-week",rank_, ___ , ___, XMLElement"div","class"→"chart-row__container", ___, XMLElement"div","class"→"chart-row__title", ___, XMLElement_,"class"→"chart-row__song",{name_}, ___, XMLElement_,___,"class"→"chart-row__artist",___,artist_, ___ , ___ , ___ ⧴ <| "Rank"→ ToExpressionrank, "Score"→ maxRank+1- ToExpressionrank, "Name"→ name, "ArtistName"→ StringReplaceartist, StartOfString ~~WhitespaceCharacter... |>, Infinity getRankingurl_String, maxRank_:= getRankingImportURLReadurl, CharacterEncoding→" UTF8 getRankingHot100date_DateObject:= getRankingURLBuild"http://www.billboard.com", " charts getRankingBillboard200date_DateObject:= getRankingURLBuild"http://www.billboard.com " hot100StartDate= DateObject[{1958, 8, 9}]; billboard200StartDate= DateObject[{1963, 8, 24}]; hot100Data.nb ���3 Hot 100 FloorToday- hot100StartDate 7 days 3032 hot100Raw= MonitorTableWithdate= hot100StartDate+ 7 days *n, <| "Date"→ date, "Rankings"→ getRankingHot100[date]|>, n, 0, FloorToday- hot100StartDate 7 days , n; (*DumpSave["~/Dropbox/mathematica/hot100/hot100Raw.mx",hot100Raw];*) hot100= Dataset[hot100Raw]; (*DumpSave["~/Dropbox/mathematica/hot100/hot100.mx",hot100];*) Billboard 200 FloorToday- billboard200StartDate 7 days 2769 billboard200Raw= MonitorTableWithdate= billboard200StartDate+ 7 days *n, <|"Date"→ date, "Rankings"→ getRankingBillboard200[date]|>, n, 0, FloorToday- billboard200StartDate 7 days , n; (*DumpSave["~/Dropbox/mathematica/hot100/billboard200Raw.mx",billboard200Raw];*) billboard200= Dataset[billboard200Raw]; (*DumpSave["~/Dropbox/mathematica/hot100/billboard200.mx",billboard200];*) Notes The actual Hot 100 is full of bugs! Here are missing indexes with links to check them on the original site: 4 ��� hot100Data.nb hot100DeleteCases[{{}, _}], Query"Rankings"/*Complement[Range[100],#]&, All, "Rank", "Date"/*Hyperlink@URLBuild[{"http://www.billboard.com", "charts", "hot-100", DateString[#,{"Year", "-", "Month", "-", "Day"}]}]& △ {51, 64, 75} http://www.billboard.com/charts/hot-100/1958-08-09 {10, 82} http://www.billboard.com/charts/hot-100/1958-08-16 {6, 72, 82} http://www.billboard.com/charts/hot-100/1958-08-23 {9} http://www.billboard.com/charts/hot-100/1958-08-30 {29} http://www.billboard.com/charts/hot-100/1960-03-19 {35} http://www.billboard.com/charts/hot-100/1961-02-04 {29} http://www.billboard.com/charts/hot-100/1961-04-08 {19} http://www.billboard.com/charts/hot-100/1961-04-15 {69} http://www.billboard.com/charts/hot-100/1961-08-26 {59} http://www.billboard.com/charts/hot-100/1961-09-02 {64} http://www.billboard.com/charts/hot-100/1961-09-16 {45} http://www.billboard.com/charts/hot-100/1961-09-23 {40} http://www.billboard.com/charts/hot-100/1961-09-30 {38} http://www.billboard.com/charts/hot-100/1961-10-07 {25} http://www.billboard.com/charts/hot-100/1961-10-14 {46} http://www.billboard.com/charts/hot-100/1961-10-21 {38} http://www.billboard.com/charts/hot-100/1961-10-28 {38} http://www.billboard.com/charts/hot-100/1961-11-04 {36} http://www.billboard.com/charts/hot-100/1961-11-11 {37} http://www.billboard.com/charts/hot-100/1961-11-18 showing 1–20 of 99 ▽ Some entries in the Billboard 200 just don’t exist: hot100Data.nb ���5 billboard200[ListLinePlot, "Rankings"/* Length] 200 150 100 50 500 1000 1500 2000 2500 billboard200[Position[0], "Rankings"/* Length] 672 1115 1145 1148 1149 1150 1153 1154 1155 1158 1159 1160 1163 1164 1165 1197 1306 1316 6 ��� hot100Data.nb Analysis Initialization Constants Get["~/Dropbox/mathematica/hot100/hot100.mx"] Get["~/Dropbox/mathematica/hot100/billboard200.mx"] hot100Entries= hot100[Map[Map[Prepend["Date"→#Date],#Rankings]&] /* Catenate]; hot100Artists= hot100Entries[GroupBy["ArtistName"]]; hot100Songs= hot100Entries[GroupBy[{#Name,#ArtistName}&]]; hot100Years= hot100[GroupBy[DateObject[{DateValue[#Date, "Year"]}]&]]; billboard200Entries= billboard200[Map[Map[Prepend["Date"→#Date],#Rankings]&] /* Catenate]; billboard200Artists= billboard200Entries[GroupBy["ArtistName"]]; billboard200Albums= billboard200Entries[GroupBy[{#Name,#ArtistName}&]]; billboard200Years= billboard200[GroupBy[DateObject[{DateValue[#Date, "Year"]}]&]]; Functions fillDates[data_Association]:= Merge[{Association@Thread[Normal@hot100[All, "Date"]→0], data}, Last] fillDates[data_List]:= KeyValueMap[List, fillDates@Association[Rule @@@ data]] fillDates[data_Dataset]:= fillDates[Normal[data]] Basic Info Distribution of artist scores: hot100Data.nb ���7 hot100Artists[Histogram, Total, "Score"] 3000 2500 2000 1500 1000 500 0 0 1000 2000 3000 4000 5000 hot100Artists[All, Total, "Score"][Sort/* Reverse] △ Madonna 57 370 Elton John 56 042 Taylor Swift 49 685 Mariah Carey 45 172 Stevie Wonder 42 555 The Beatles 42 470 Michael Jackson 39 697 Rod Stewart 37 863 Whitney Houston 37 742 The Rolling Stones 37 095 Rihanna 36 746 Chicago 35 663 Aretha Franklin 34 880 Billy Joel 34 705 Daryl Hall John Oates 33 576 Bee Gees 32 890 Neil Diamond 32 711 Kelly Clarkson 32 707 R. Kelly 30 776 Janet Jackson 30 727 showing 1–20 of 9146 ▽ 8 ��� hot100Data.nb hot100Artists[All, Total, "Score"][WordCloud] ������� ������ ������ ������� ���������� ����� ���� ����� ����� ������ ��������� ��� ����� ������� ����������� ����� ������� ��� ������������ ����������� ���� ����� ���������� ������� ������ ���� ��������� ���� ������� ������ ������ ������� ���� ���� ����� ��� ������ �������� ���� ���� ����� ������������ ����������� ������� ����� ����� ����� ����������� ����� ������ ����� ������������ ���������������� ��� ����� ������ ���� ������ ���������� ����� ������� ������ � ������ ���������� ����������� ���������� ������ ������ ������-���� ���������� ������� ����� ������������������ ���� ���� ������� ������� �������� ����� ��������� ���� ������ ������� �� �� ����� ����� ��� ������� ������������ ���� ����� ���� ������������ ������� ������ ���� ��� ����������� ����� ������ ��� ����� �������� ����� ������� ������ ������ ����� ����� ����� ������ ������ ����� ���� �� ��� ���� �� ����� ����� ������ ���� ���� ����� ���� ��� ������� ������ ���� ���� ��� ���� ����� ������� ���� ��� ����������� ������ ��������� ��������� ��� ����� ��� ����� ������ ������������ ���� ���� ����� ������ ������ ����� ����� ��� �������� ��� ����������� ������� ����� ������������ �������������� ������� ������ ������ ��� ��� ���� ������� ������ ������ ���� ��� ����� ���� ���� Weeks with #1 by artist: hot100[Counts/* Histogram, "Rankings", 1, "ArtistName"] 200 150 100 50 2 4 6 8 10 12 14 hot100Data.nb ���9 hot100[Counts/* Sort/* Reverse, "Rankings", 1, "ArtistName"] △ Mariah Carey 60 The Beatles 54 Boyz II Men 34 Madonna 32 Whitney Houston 31 Michael Jackson 30 The Black Eyed Peas 28 Bee Gees 27 Adele 24 Elton John 23 Usher 22 Janet Jackson 21 Elvis Presley With The Jordanaires 21 The Supremes 19 TLC 18 Katy Perry 18 The Rolling Stones 17 Rod Stewart 17 Olivia Newton-John 17 The 4 Seasons 16 showing 1–20 of 701 ▽ Most common words in song titles: hot100Entries[Catenate/* DeleteStopwords/* WordCloud, "Name"/* TextWords/* ToLowerCase] $Aborted Most common words in song titles for #1 songs: 10 ��� hot100Data.nb hot100[Catenate/* DeleteStopwords/* WordCloud, "Rankings", 1, "Name"/* ToLowerCase/* TextWords] ��� ��� ������ ���� ���� ����� ���� ��� ����� ���� ����� ����/ ���� ���� ���� ����� ������ ����� ��� ����� ��� �������� ���� ����� ������ ���� ������� ����� ��� ����� ���� ���� ������ ���� �� ������� ����� ���� ���� ����� ������� ��� � ���� ��������� � ���� ���� ��� ����� � ���� � ����� ����� �� ����� �� �� ����������������� ���� ��� ����� ������� ���� ��� ��� ��������� ��� ���� ����� ����� ������� ����� ���� ���� ���� ���� ������ ����� ��� ��� ���� ���� ���� ���� ���� ����� ����� ����� ����� ���� ����� ���� ���� ����� ���� ��� hot100Entries[Catenate/* DeleteStopwords/* WordCloud, "Name"/* TextWords/* ToLowerCase] Songs with the highest scores: hot100Data.nb ���11 hot100Songs[All, Total, "Score"][Sort/* Reverse] △ Radioactive Imagine How Do I Live LeAnn Foolish Games/You Were Meant For Me Jewel I'm Yours Jason Party Rock Anthem LMFAO Counting Stars OneRepublic Uptown Funk! Mark Rolling In The Deep Adele Smooth Santana Somebody That I Used To Know Gotye I Gotta Feeling The Need You Now Lady Dark Horse Katy Before He Cheats Carrie Truly Madly Deeply Savage Too Close Next Macarena(Bayside Boys Mix) Los Thinking Out Loud Ed Sheeran Ho Hey The Trap Queen Fetty ▽ showing 1–20 of 27 459 #1 songs by artist: 12 ��� hot100Data.nb hot100Songs[Select[Min[#[[All, "Rank"]]] ===1&] /* Counts, 1, "ArtistName"][ Sort/* Reverse] △ The Beatles 19 Mariah Carey 15 Madonna 12 Whitney Houston 11 Michael Jackson 11 The Supremes 10 Bee Gees 9 The Rolling Stones 8 Stevie Wonder 7 Janet Jackson 7 Rihanna 6 Phil Collins 6 Katy Perry 6 Elvis Presley With The Jordanaires 6 Daryl Hall John Oates 6 Usher 5 Paula Abdul 5 KC And The Sunshine Band 5 George Michael 5 Elton John 5 showing 1–20 of 701 ▽ Artists Over Time The Beatles hot100Data.nb ���13 hot100[DateListPlot[#, PlotRange→ All]&,{"Date", Query["Rankings", Select[#ArtistName === "The Beatles" &] /* Total, "Score"]}] 800 600 400 200 0 1960 1980 2000 hot100DateListPlot#, PlotRange → 1960 , 1975 , All&,{"Date", Query["Rankings", Select[#ArtistName === "The Beatles" &] /* Total, "Score"]} 800 600 400 200 0 1960 1965 1970