CIP-Group-Project

This repository contains the code and cleaned data for the project “Analyzing Accessibility and Pricing in Swiss Pharmacies: The Case of Painkillers”. The work was created as part of the course Data Collection, Integration and Preprocessing at the Lucerne University of Applied Sciences and Arts (HSLU).

Authors: Andri Gerber1, Manuela Flury2, Remigiusz Piotr Trydulski3

Project Overview

Painkillers are among the most commonly sold medicines in Switzerland, but may only be dispensed directly in pharmacies. To shed light on pricing strategies and accessibility, we scraped product information and opening hours from the three major pharmacy chains Amavita, Coop Vitality and Sunstore. The results are summarised in report.pdf.

Methodology

  1. Data Extraction – Dynamic and static website content were scraped using a combination of Selenium, Scrapy and BeautifulSoup. Separate scrapers collected painkiller data and store opening hours for each chain.
  2. Data Transformation – After loading the raw CSV files with pandas, the datasets were deduplicated, cleaned and enriched. New features such as price per unit or urban vs. rural classification were added.
  3. Analysis – The merged datasets were used to compare prices between pharmacies, generics and branded products. Opening hours were evaluated by canton and by chain to highlight geographic differences.

Key Findings

Directory Structure

📦 Groupwork
┣ 📂 Andri
┃ ┣ 📂 data
┃ ┃ ┣ 🔒📜 sunstore_cleaned_painkillers_2024_10_31_22h_36m.csv
┃ ┃ ┣ 🔒📜 sunstore_cleaned_store_2024_10_31_22h_37m.csv
┃ ┃ ┗ 📜 .gitignore
┃ ┣ 📂 extract_phase
┃ ┃ ┣ 📂 sunstore_beautiful_soup_scraper
┃ ┃ ┃ ┗ 📂 sunstore_beautiful_soup_scraper
┃ ┃ ┃   ┣ 📂 output
┃ ┃ ┃   ┃ ┣ 🔒📜 painkillers_Date_2024_10_18_Time_01h_06m.csv
┃ ┃ ┃   ┃ ┗ 📜 .gitignore
┃ ┃ ┃   ┣ 📂 spiders
┃ ┃ ┃   ┃ ┣ 📜 __init__.py
┃ ┃ ┃   ┃ ┗ 📜 sunstore_spider.py
┃ ┃ ┃   ┣ 📜 __init__.py
┃ ┃ ┃   ┣ 📜 settings.py
┃ ┃ ┃   ┗ 📜 scrapy.cfg
┃ ┃ ┗ 📂 sunstore_sel_scraper
┃ ┃   ┗ 📂 sunstore_sel_scraper
┃ ┃     ┣ 📂 spiders
┃ ┃     ┃ ┣ 📂 outputs
┃ ┃     ┃ ┃ ┣ 📜 .gitignore
┃ ┃     ┃ ┃ ┗ 🔒📜 storepickup_Data_19_Oct_2024_13_46_52.xlsx
┃ ┃     ┃ ┣ 📜 __init__.py
┃ ┃     ┃ ┗ 📜 sunstore_sel_spider.py
┃ ┃     ┣ 📜 __init__.py
┃ ┃     ┣ 📜 settings.py
┃ ┃     ┗ 📜 scrapy.cfg
┃ ┗ 📂 transform_phase
┃ ┃ ┣ 📂 modules
┃ ┃ ┃ ┣ 📜 __init__.py
┃ ┃ ┃ ┣ 📜 data_io.py
┃ ┃ ┃ ┣ 📜 openai_utils.py
┃ ┃ ┃ ┗ 📜 processing.py
┃ ┃ ┣ 📜 config.json
┃ ┃ ┗ 📜 main.py
┣ 📂 data
┃ ┣ 📜 .gitignore
┃ ┣ 🔒📜 amavita_cleaned_painkillers_20241101.csv
┃ ┣ 🔒📜 amavita_cleaned_store_20241105.csv
┃ ┣ 🔒📜 coop_vitality_cleaned_painkillers.csv
┃ ┣ 🔒📜 coop_vitality_cleaned_store_data.csv
┃ ┣ 🔒📜 sunstore_cleaned_painkillers_2024_10_31_22h_36m.csv
┃ ┗ 🔒📜 sunstore_cleaned_store_2024_10_31_22h_37m.csv
┣ 📂 group_work_phase
┃ ┣ 📂 modules
┃ ┃ ┣ 📜 __init__.py
┃ ┃ ┣ 📜 data_merger.py
┃ ┃ ┣ 📜 painkiller_analysis.py
┃ ┃ ┗ 📜 store_analysis.py
┃ ┣ 📜 analysis_store.ipynb
┃ ┣ 📜 analysis_painkiller.ipynb
┃ ┗ 📜 availability_summary.html
┣ 📂 Manuela
┃ ┣ 📂 data
┃ ┃ ┣ 🔒📜 amavita_cleaned_painkillers_20241101.csv
┃ ┃ ┣ 🔒📜amavita_cleaned_store_20241105.csv
┃ ┃ ┣ 🔒📜 amavita_painkillers_20241101.csv
┃ ┃ ┣ 🔒📜 amavita_painkillers_updated_20241101.csv
┃ ┃ ┣ 🔒📜 amavita_stores_20241027.csv
┃ ┃ ┣ 🔒📜 AMTOVZ_CSV_LV95.csv
┃ ┃ ┣ 🔒📜 px-x-0103010000_201_20241030-075409.csv
┃ ┃ ┗ 📜 .gitignore
┃ ┣ 📂 extract_phase
┃ ┃ ┣ 📜 amavita_painkillers.py
┃ ┃ ┗ 📜 amavita_stores.py
┃ ┣ 📂 transform_phase
┃ ┃ ┣ 📜 CleanUP_amavita_painkillers.py
┃ ┃ ┗ 📜 CleanUP_amavita_stores.py
┃ ┗ 📜 test.py
┣ 📂 Remigiusz
┃ ┣ 📂 data
┃ ┃ ┣ 🔒📜 coop_vitality_cleaned_painkillers.csv
┃ ┃ ┣ 🔒📜 coop_vitality_cleaned_store_data.csv
┃ ┃ ┣ 🔒📜 coop_vitality_painkillers.csv
┃ ┃ ┣ 🔒📜 coop_vitality_store_data.csv
┃ ┃ ┣ 🔒📜 27864_131.csv
┃ ┃ ┣ 🔒📜 AMTOVZ_CSV_WGS84.csv
┃ ┃ ┗ 📜 .gitignore
┃ ┣ 📂 extract_phase
┃ ┃ ┣ 📜 coop_vitality_scraper_painkillers.py
┃ ┃ ┣ 📜 coop_vitality_scraper_store_data.py
┃ ┃ ┣ 📜 robots_checker_scraper_painkillers.py
┃ ┃ ┗ 📜 robots_checker_scraper_store_data.py
┃ ┗ 📂 transform_phase
┃   ┣ 📜 coop_vitality_cleaning_transforming_painkillers.ipynb
┃   ┗ 📜 coop_vitality_cleaning_transforming_store_data.ipynb
┣ 📂 saves
┃ ┗ 📜 .gitignore
┣ 🔒📜 .env (with OPENAI_API_KEY=your_key)
┣ 📜 README.md
┗ 📜 requirements.txt

Note: The datasets marked with 🔒 are not included in the repository. In addition, the .env file is not committed and must be created manually with your personal API key.

Getting Started

Install the required Python packages and provide your OpenAI API key:

pip install -r requirements.txt

Create a .env file with

OPENAI_API_KEY=***
  1. Email: andri.gerber\@stud.hslu.ch. Department of Business, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland. HSLU. ORCiD ID

  2. Email: manuela.flury\@stud.hslu.ch. Department of Business, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland. HSLU

  3. Email: remigiusz.trydulski\@stud.hslu.ch. Department of Business, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland. HSLU