Flask Meetup Data Scraper Documentation Release 0.1

Flask Meetup Data Scraper Documentation Release 0.1

Flask Meetup Data Scraper Documentation Release 0.1 Steffen Exler Feb 04, 2020 Contents 1 Intro 3 2 Getting started 7 2.1 Development & Production Version...................................7 2.2 Quick install (Development Version)..................................7 2.3 Quick install (Production Version)....................................8 3 Usage Guide 11 3.1 CLI.................................................... 11 4 Advanced topics 15 4.1 Changing Models............................................. 15 4.2 Documentation.............................................. 15 5 IDE 17 5.1 Recommanded Extensions for VS-Code................................. 17 5.2 Install Python............................................... 17 5.3 Install Python dependencies....................................... 18 5.4 Code Format............................................... 19 6 Rest API Documentation 21 6.1 About................................................... 21 6.2 Suggestion................................................ 21 6.3 Search.................................................. 22 7 C/I 27 7.1 About................................................... 27 7.2 Code Style................................................ 27 7.3 Tests................................................... 27 7.4 Documentation.............................................. 27 7.5 Code Review............................................... 28 7.6 Dependencies............................................... 28 8 Elasticsearch Queries 29 9 Frontend 31 10 Testing 33 i 10.1 Pytest................................................... 33 10.2 Conftest.................................................. 33 10.3 Create new test.............................................. 33 10.4 Execute tests............................................... 34 11 Troubleshooting 35 11.1 max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]. 35 11.2 Build faild -> out of memory....................................... 35 11.3 Error when starting container under Windows.............................. 36 12 FAQ 37 12.1 What are the minimum hardware requirements?............................. 37 12.2 How to set the domain for a production site?.............................. 37 13 Indices & Tables 39 Index 41 ii Flask Meetup Data Scraper Documentation, Release 0.1 Table of Contents: Contents 1 Flask Meetup Data Scraper Documentation, Release 0.1 2 Contents CHAPTER 1 Intro Fulltext Meetup.com Search engine, download & index every meetup Group and the Group Events in a region every 28 days. With the fulltext meetup search is it possible to search in every discription and any other field. So you know in wich places you done a talk. Fig. 1: Problem: In wich meetups did my team a talk? Because it is not possible to create a fulltext search request on the official meetup API, so a way to solve the issue is 3 Flask Meetup Data Scraper Documentation, Release 0.1 to download relevant meetup groups and index them into a elasticsearch. Fig. 2: Solution: Download every relevant group from meetup and index them into elasticsearch! The Dataflow concept is that the API Server, wich is written in Python with the Flask webframework, download every 28 days all relevant meetup groups with there events and index them into elasticsearch. The search user use an angular app to communicate with the API server. For an easy deployment the angular app has it’s own NGINX based docker container and the traefik container route every http & https traffik to the angular container expect PUT request. Put request are routet to the API server. Also traefik secure the communication with the enduser via SSL and used to handle basic auth request for the frontend & backend! 4 Chapter 1. Intro Flask Meetup Data Scraper Documentation, Release 0.1 5 Fig. 3: DataFlow Flask Meetup Data Scraper Documentation, Release 0.1 6 Chapter 1. Intro CHAPTER 2 Getting started Note: These instructions assume familiarity with Docker and Docker Compose. 2.1 Development & Production Version The Project comes with 2 different Docker-Compose files wich are for development local.yml and production production.yml. The development version start the website in debug mode and bind the local path ./ to the flask docker contaiers path /app. For the production version, the docker container is build with the code inside of the container. Also the production version use redis as caching backend. 2.2 Quick install (Development Version) Build the docker container. $ docker-compose -f local.yml build Migrate all models into Elasticsearch $ docker-compose -f local.yml run flask flask migrate_models Load the Meetup Sandbox Group with all events. $ docker-compose -f local.yml run flask flask get_group --sandbox True Start the website. 7 Flask Meetup Data Scraper Documentation, Release 0.1 $ docker-compose -f local.yml up Now the server is listen on http://localhost:5000 for any REST API requests. 2.3 Quick install (Production Version) 2.3.1 Settings At first create the directory ./.envs/.production $ mkdir ./.envs/.production` For flask container create a file ./.envs/.production/.flask wich should look like: To fill the section # Meetup.com OAuth you need an API account from Meetup.com. When setting up the meetup oauth account add https://you-domain.com/callback as your callback url & with https:// you-domain.com/login you can log in with your meetup account. To read how to handle a boundingbox in the section # Meetup.com groups region go to load_zip_codes. For Elasticsearch container create a file ./.envs/.production/.elasticsearch wich should look like be- low. For further information how to setup Elasticsearch with enviroment vars got to https://www.elastic.co/guide/en/ elasticsearch/reference/current/settings.html 2.3.2 Add Users Frontend & backend has the same endpoint for user authentification. Both use Basic Auth from traefik. To add a user, use htpasswd and store the user data into compose/production/traefik/basic-auth-usersfile. Example use in Linux: $ sudo apt install apache2-utils # install htpasswd $ htpasswd -c compose/production/traefik/basic-auth-usersfile username 2.3.3 Setup Build the docker container. $ docker-compose -f production.yml build Create the elasticsearch indexes. $ docker-compose -f production.yml run flask flask migrate_models Load Meetuup zip codes for a country. $ docker-compose -f production.yml run flask flask load_zip_codes 47.2701114 55. ,!0991615.8663153 15.0418087 # germany $ docker-compose -f production.yml run flask flask load_zip_codes 45.817995 47. ,!80846485.9559113 10.4922941 # switzerland $ docker-compose -f production.yml run flask flask load_zip_codes 46.3722761 49. ,!02053059.5307487 17.160776 # austria Load Meetuup zip codes for a country. 8 Chapter 2. Getting started Flask Meetup Data Scraper Documentation, Release 0.1 $ docker-compose -f production.yml run flask flask load_zip_codes 47.2701114 55. ,!0991615.8663153 15.0418087 # germany $ docker-compose -f production.yml run flask flask load_zip_codes 45.817995 47. ,!80846485.9559113 10.4922941 # switzerland $ docker-compose -f production.yml run flask flask load_zip_codes 46.3722761 49. ,!02053059.5307487 17.160776 # austria Start the website. $ docker-compose -f production.yml up -d 2.3.4 Conjob Add a cronjob to run every week. So that every 4 weeks the elasticsearch index should be resetet. If you want a another periode change the 4 with your periode time. But don’t use a persiod over 30 days! It is forbidden by meetup.com!!: 03 ** 6 docker-compose-f production.yml run flask flask ,!reset_index--reset_periode4 Description what does this command do, is under reset_index. 2.3. Quick install (Production Version) 9 Flask Meetup Data Scraper Documentation, Release 0.1 10 Chapter 2. Getting started CHAPTER 3 Usage Guide Note: The usage guide is written for the development instance, to use the commands for prodcution change -f local.yml to -f production.yml for every command! 3.1 CLI 3.1.1 Help For showing a general help for all commands run: $ docker-compose -f local.yml run flask flask To display a helptext for a single command, add to the command --help like: $ docker-compose -f local.yml run flask flask shell --help 3.1.2 migrate_models To use elasticsearch search, the index for the groups & zip codes musst be set. To do this run: $ docker-compose -f local.yml run flask flask migrate_models 3.1.3 Load single group from Meetup.com To download a single group from Meetup.com and index them into elasticsearch use get_group. For down- loading a group use the urlname. When you are on a Meetup group page like https://www.meetup.com/de-DE/ Meetup-API-Testing/, the urlname is in the browser URL field after the language-code block. 11 Flask Meetup Data Scraper Documentation, Release 0.1 In the example for Meetup-API-Testing group, the language-code is de-DE and the urlname is Meetup-API-Testing. Fig. 1: Meetup group urlname screenshot Also when you search for new groups in the meetup.com rest api, the urlname has it extra field for request a group entpoint. So to download & index the Meetup-API-Testing group use: $ docker-compose -f local.yml run flask flask get_group Meetup-API-Testing The output should look like: Starting flask-meetup-data-scraper_elasticsearch_1 ... done Elasticsearch is available Group Meetup API Testing Sandbox was updatet with 761 events For testing purpose it’s possible to load the sandbox group with the param --sandbox True, than it’s not required to add a urlname.: $ docker-compose -f local.yml run flask flask get_group --sandbox

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    45 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us