Exploring Airbnb Prices in Geneva
Authors: Andri O. Gerber1, Matthias Schmied2
Airbnb, a prominent platform for home-sharing and vacation rentals, has become synonymous with the global phenomenon of peer-to-peer accommodation. Given the vast amount of data generated by Airbnb listings, analyzing those data sets offers valuable insights into the dynamics of the platform and its implications on urban housing markets, travel patterns, and local economies. In this exploratory analysis, we delve deep into various facets of the Airbnb data sets, examining key indicators that influence both pricing and occupancy.
Our visualization journey begins with a detailed view of the
Occupancy over Time
. This plot reveals the ebb and flow of
Airbnb accommodations, with special markers denoting public holidays - a
crucial determinant for understanding seasonal patterns and external
influences of occupancy rates.
How does occupancy rate trend around the times of public holidays? Is there a significant increase in bookings just before or right after these holidays?
Are there patterns of repeated spikes in occupancy during certain times of the year?
We then shift our focus to the
Average Daily and Monthly Prices
, aiming to capture
temporal pricing fluctuations, possibly driven by demand-supply
imbalances, seasonal variations, or events. To further enrich our
comprehension, we analyze the
Price Difference over Weekdays
, elucidating whether certain
days attract premium pricing over others.
Do the daily prices show noticeable fluctuations throughout the month, indicating peak periods where demand outpaces supply?
How do monthly prices evolve over the 6 months? Are there particular months that consistently stand out as either high or low pricing periods?
Does the weekday pricing analysis suggest a premium on specific days of the week, such as Fridays or Sundays, indicating popular check-in or check-out preferences?
Recognizing the wide range of listing prices, we have bifurcated the
Listings Price Distribution
into two segments: one that
showcases listings up to 1,000 CHF and another dedicated to those priced
over 1,000 CHF. This segregation offers a clearer picture of the
distribution without skewing the visual representation.
What proportion of the total listings fall under the 1000 threshold, and how does this compare to the listings priced above 1,000 CHF?
Do listings priced under 1,000 CHF tend to cluster within a certain price range, or is there a wide dispersion even within this premium segment?
The significance of location in real estate is unquestionable. With
this in mind, we’ve mapped out Price by Neighbourhoods
.
Similarly, understanding that the type of property and room can
significantly influence the price, we’ve delineated the
Price by Property Type
and Price by Room Type
.
These plots offer a granular perspective on how various listing
attributes contribute to pricing.
Which neighborhoods consistently command premium prices?
Are there specific neighborhoods that offer more budget-friendly options?
How do property types, such as villas or apartments influence the listing price? Is there a clear hierarchy in pricing based on property types?
After the factors of location and property type, we turn to amenities. Using an Airbnb survey, we examine their influence on rental prices in Geneva.
To better understand the interplay between room types, neighborhoods and other categories, we have integrated a Shiny app.
For our predictive efforts, we’ve selected a logistic model with the aim of forecasting occupancy. The chosen variables - price, month, day of the week, and whether it’s a public holiday - are integral in understanding the nuanced interplay that governs the likelihood of a listing being occupied.
How significant is each predictor (price, month, day of the week, and public holiday) in influencing the likelihood of occupancy in the model?
Are there specific months that stand out as having a higher or lower probability of occupancy, holding other variables constant?
How does the day of the week influence occupancy? For instance, are weekends more likely to see higher occupancy compared to weekdays?
Does the presence of a public holiday increase the likelihood of a listing being occupied?
Following our in-depth exploration of Airbnb listings based on various factors, we’ve undertaken a spatial analysis. This approach introduces an additional dimension, offering a vivid visual perspective of listing distributions across the canton.
In sum, this analysis seeks to uncover patterns, unearth anomalies, and predict future behaviors, providing stakeholders with actionable insights into the intricate workings of Airbnb’s marketplace. Join us as we journey through these data-driven narratives, discovering the stories they unveil 🏠📊🔍.
# Locale Setting
Sys.setlocale("LC_TIME", "en_EU.UTF-8") # Set time-related locale to English
# (European format)
# Visualization & Reporting
library(ggplot2) # Data visualization
library(knitr) # Document integration
library(kableExtra) # Table formatting
library(pander) # R to Pandoc conversion
library(shiny) # Web apps
library(ggmap) # Maps
library(plotly) # Interactive plots
library(gridExtra) # Arrange plots
library(huxtable) # Styled tables
library(DT) # Interactive tables
library(viridis) # Colorblind-friendly color palettes
# Data Manipulation & Exploration
library(dplyr) # Data manipulation
library(DataExplorer) # EDA
library(lubridate) # Date-time functions
library(tidyverse) # Data science tools
library(psych) # Psychometrics
library(readxl) # Excel data import
library(testthat) # Unit test
# Spatial Data & Analysis
library(sf) # Spatial data
library(osmdata) # OpenStreetMap
library(spatstat) # Spatial statistics
library(sp) # Spatial data classes
# Analysis & Modeling
library(jtools) # Research tools
library(broom.mixed) # Tidy mixed models
library(vcd) # Categorical data
library(summarytools) # Summary tools
Set the locale setting to English (European format) and loaded several R packages to facilitate visualization, data manipulation, spatial data analysis, and modeling.
listings
and
calendar
.listings
and calendar
data sets.listings
and calendar
data sets.date
column in the calendar
data set is interpreted as a date and filtered the data set to keep
records from January to August 2023.holidays
data set from the
opendata.swiss website and transformed it.📌 Highlight: Used left_join()
to join the holidays
data set to the calendar
data set on the basis of the
date
column, adding an is_holiday
column to
denote whether a date is a holiday or not.
The exact variable description can be found in the appendix.
# path
base_path <- "../Data/"
# listings (3 datasets)
listings_dec <- read.csv(file.path(base_path, "listings_december.csv.gz"))
listings_dec$period <- 1
listings_mar <- read.csv(file.path(base_path, "listings_march.csv.gz"))
listings_mar$period <- 2
listings_jun <- read.csv(file.path(base_path, "listings_june.csv.gz"))
listings_jun$period <- 3
# check
dim(listings_dec)
dim(listings_mar)
dim(listings_jun)
# join
listings <- rbind(listings_dec, listings_mar, listings_jun)
head(listings)
nrow(listings)
# calendar (3 datasets)
cal_dec <- read.csv(file.path(base_path, "calendar_december.csv.gz"))
cal_mar <- read.csv(file.path(base_path, "calendar_march.csv.gz"))
cal_jun <- read.csv(file.path(base_path, "calendar_june.csv.gz"))
# join
calendar <- rbind(cal_dec, cal_mar, cal_jun)
# ensure that the column "date" is interpreted as a date
class(calendar$date)
calendar$date <- as.Date(calendar$date)
# Filter 'calendar' from 2023-01-01 till 2023-08-01, because of missing values starting in august
calendar <- calendar %>%
filter(
date < as.Date("2023-08-01", format = "%Y-%m-%d") &
date >= as.Date("2023-01-01", format = "%Y-%m-%d")
)
# check dates (min/max) evtl unittest
range_dates <- range(calendar$date, na.rm = T)
# 'na.rm = T' ignore missing values
print(paste("The dates ranges from", range_dates[1], "to", range_dates[2]))
# import holidays dataset
holidays_raw <- read.csv(file.path(base_path, "schulferien.csv"), sep = ",", header = TRUE, stringsAsFactors = FALSE)
# transform from chr to date
holidays_raw <- holidays_raw %>%
mutate(
start_date = as.Date(start_date, format = "%Y-%m-%d %H:%M:%S"),
end_date = as.Date(end_date, format = "%Y-%m-%d %H:%M:%S")
)
# create new data.frame
holidays <- data.frame(date = seq(as.Date("2023-01-01"), as.Date("2023-07-31"), by = "1 day"))
# check if a date from holidays falls within a interval in holidays_raw
holidays$is_holiday <- sapply(holidays$date, function(d) {
any(holidays_raw$start_date <= d & holidays_raw$end_date >= d)
})
# convert boolean values to numeric 0 (no holiday) 1 holiday
holidays$is_holiday <- as.integer(holidays$is_holiday)
# Joining is_holiday to the calendar data frame based on the date
calendar <- calendar %>%
left_join(holidays, by = "date")
# date
calendar$date <- as.Date(calendar$date)
# Price
# Before transformation
na_and_empty_count_before <- sum(is.na(calendar$price) | calendar$price == "")
# transformation price: taking "$" and "," away for numeric
calendar$price <- gsub("\\$", "", calendar$price)
calendar$price <- gsub(",", "", calendar$price)
calendar$price <- as.numeric(calendar$price)
# After transformation
na_count_after <- sum(is.na(calendar$price))
date
column was correctly
formatted.price
column to remove currency symbols
and commas, converting it to numeric format.📌 Highlight: Created a Unit
Test to ensure that only empty strings were converted to
NA
in the price
column during the
transformation process.
# Unit Test that compares transformation "same length":
test_that("Only empty strings were converted to NA in calendar$price", {
expect_equal(na_and_empty_count_before, na_count_after)
})
# transformation price: taking "$" and "," away for numeric
listings$price <- gsub("\\$", "", listings$price)
listings$price <- gsub(",", "", listings$price)
listings$price <- as.numeric(listings$price)
price
column similar to the
calendar
data set.# add swiss franc
class(calendar$price)
calendar$price_swiss_franc <- calendar$price * 0.88
# create mean_occupancy
occupancy_by_date <- calendar %>%
mutate(occupancy = ifelse(available == 't', 0, 1)) %>%
group_by(date) %>%
summarise(mean_occupancy = mean(occupancy)) %>%
ungroup()
# join mean_occupancy to calendar
calendar <- left_join(calendar, occupancy_by_date, by = "date")
# add occupancy
calendar <- calendar %>%
mutate(occupancy = ifelse(available == 't', 0, 1))
# add months names
calendar <- calendar %>%
mutate(month_name = month(date, label = TRUE, abbr = TRUE))
# add week names
calendar <- calendar %>%
mutate(dayweek = weekdays(as.Date(date)))
price
column to Swiss Francs.mean_occupancy
for each date.occupancy
, month name
(month_name
), and the name of the day of the week
(dayweek
).# add swiss franc
class(listings$price)
listings$price_swiss_franc <- listings$price * 0.88
# add amenities groups
listings <- listings %>%
mutate(
showergel_or_shampoo = grepl(
"([S-s]hower\\s*[-]*[G-g]el)|([S-s]hampoo)",
amenities,
ignore.case = T
)
) %>%
mutate(wifi = grepl("[W-w]ifi", amenities, ignore.case = T)) %>%
mutate(freeparking = grepl("[F-f]ree\\s*[-]*[P-p]arking", amenities, ignore.case = T)) %>%
mutate(pool = grepl("([P-p]ool)|([J-j]acuzzi)", amenities, ignore.case = T)) %>%
mutate(dishwasher = grepl("[D-d]ish\\s*washer", amenities, ignore.case = T)) %>%
mutate(washer = grepl("[W-w]asher", amenities, ignore.case = T)) %>%
mutate(selfcheckin = grepl("[S-s]elf\\s*check[-]*\\s*in", amenities, ignore.case = T)) %>%
mutate(petsallowed = grepl("[P-p]ets\\s*allowed", amenities, ignore.case = T)) %>%
mutate(refrigerator = grepl("[R-r]efrigerator", amenities, ignore.case = T)) %>%
mutate(airconditioner = grepl("[A-a]ir\\s*conditioner", amenities, ignore.case = T)) %>%
ungroup()
# add a column to sum the amenities for each row
listings$row_sums <-
rowSums(listings[, c(
"showergel_or_shampoo",
"wifi",
"freeparking",
"pool",
"dishwasher",
"washer",
"selfcheckin",
"petsallowed",
"refrigerator",
"airconditioner"
)])
price
column to Swiss Francs.amenities
and
calculated their row sum.# calendar_short
calendar_short <- calendar %>%
select(price_swiss_franc,
listing_id,
date,
available,
is_holiday,
mean_occupancy,
occupancy,
month_name,
dayweek)
# listings_short
listings_short <- listings %>%
select(price_swiss_franc,
id,
property_type,
room_type,
neighbourhood_cleansed,
period,
amenities,
latitude,
longitude)
Created a subset of the calendar
and
listings
data sets named calendar_short
and
listings_short
to focus on key variables.
# overview
summary(calendar_short)
str(calendar_short)
dplyr::glimpse(calendar_short)
psych::describe(calendar_short)
summary(calendar_short)
DataExplorer::plot_bar(calendar_short)
cat("<div style='overflow-x: auto; width: 100%; max-height: 500px;'>")
print(dfSummary(calendar_short), method = 'render')
No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | price_swiss_franc [numeric] |
|
1367 distinct values | 837621 (100.0%) | 330 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | listing_id [numeric] |
|
2975 distinct values | 837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | date [Date] |
|
212 distinct values | 837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | available [character] |
|
|
837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | is_holiday [integer] |
|
|
837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | mean_occupancy [numeric] |
|
198 distinct values | 837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | occupancy [numeric] |
|
|
837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | month_name [ordered, factor] |
|
|
837951 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | dayweek [character] |
|
|
837951 (100.0%) | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.2.2)
2023-09-30
cat("</div>")
For the distribution plot of variables within the
calendar
data set and for additional insights, please refer
to the appendix.
# overview
str(listings_short)
dplyr::glimpse(listings_short)
psych::describe(listings_short)
summary(listings_short)
DataExplorer::plot_bar(listings_short)
# DataExplorer::plot_missing(listings_short, title = "Missing Values Listings")
# profile_missing(listings$neighbourhood_cleansed)
cat("<div style='overflow-x: auto; width: 100%; max-height: 500px;'>")
print(dfSummary(listings_short), method = 'render')
No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | price_swiss_franc [numeric] |
|
463 distinct values | 6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2 | id [numeric] |
|
2976 distinct values | 6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3 | property_type [character] |
|
|
6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
4 | room_type [character] |
|
|
6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
5 | neighbourhood_cleansed [character] |
|
|
6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
6 | period [numeric] |
|
|
6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7 | amenities [character] |
|
|
6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
8 | latitude [numeric] |
|
3174 distinct values | 6932 (100.0%) | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
9 | longitude [numeric] |
|
3479 distinct values | 6932 (100.0%) | 0 (0.0%) |
Generated by summarytools 1.0.1 (R version 4.2.2)
2023-09-30
cat("</div>")
For the distribution plot of variables within the
listings
data set and for additional insights, please refer
to the appendix.
The calendar_short
dataset consists of a substantial
837,951 entries across 10 variables. Within this set:
The price_swiss_franc
variable displays a mean
listing price of 167.8 CHF with a considerable range, spanning from 8.8
CHF to an outlier value of 79,393.6 CHF. Despite the broad spectrum of
listing prices, the median remains at a more modest 101.2 CHF,
indicative of a skewed distribution.
Occupancy data suggests a robust demand in the area; 58.9 % of the listings are booked. The data set further enriches this perspective with a holiday variable, revealing that 32.3 % of the data falls on holidays.
In comparison, the listings_short
dataset, with 6,932
entries spread over 9 variables, provides detailed insights into
property specifics:
The property_type
variable delineates the listings
landscape. The dominant listing type is “Entire rental unit” accounting
for 57.3 %, followed by “Private room in rental unit” at 17.2 %,
offering a glance into the prevalent accommodation preferences.
Diving deeper, the room_type
variable showcases that
a considerable 70.8 % of the listings are categorized as “Entire
home/apt”, while “Private rooms” comprise 28.4 %. Geographically, most
listings, precisely 72.9 %, are situated in the “Commune de Genève”
region, indicating its allure.
Price points in this data set present an average of 170.1 CHF with the range mirroring its counterpart data set with prices scaling up to 79,393.6 CHF.
# calendar (selected variables)
calendar_missings_plot <- calendar_short %>%
DataExplorer::plot_missing(
title = "Calendar Dataset",
group = list(
"Marginal fraction" = 0.05,
OK = 0.4,
Bad = 0.8,
Remove = 1
)
) + xlab("Variables") + ylab("Missing rows")
# listings (selected variables)
listings_missings_plot <- listings_short %>%
DataExplorer::plot_missing(
title = "Listings Dataset",
group = list(
"Marginal fraction" = 0.05,
OK = 0.4,
Bad = 0.8,
Remove = 1
)
) + xlab("Variables") + ylab("Missing rows")
# Plot them side by side
grid.arrange(calendar_missings_plot, listings_missings_plot, ncol = 2)
# Numbers for text
calendar_missing_text <- calendar_missings_plot$data[1,1:3]
The listings data set is devoid of missing values, indicating meticulous data recording for each property.
In the calendar data set, only the price_swiss_franc
variable presents a data gap with 330 missing entries, a mere 0.04 % of
the data set. This minor discrepancy, while worth noting, is unlikely to
hinder in-depth analyses or interpretations.
With these insights in hand, we’re primed to delve deeper into the data sets through visual exploration.
# The minimum value
min_val <- min(calendar$mean_occupancy*100, na.rm = TRUE)
occupancy_time_plot <-
ggplot(calendar, aes(x = date, y = mean_occupancy*100)) +
# Rectangle for holiday
geom_rect(data = subset(calendar, is_holiday == 1), # 1 = holiday
aes(xmin = date - 0.5,
xmax = date + 0.5,
ymin = min_val - 0.02*100,
ymax = min_val - 0.01*100,
fill = "School holidays"), # Holiday fill
alpha = 0.1) +
scale_x_date(date_breaks = "1 month", date_labels = "%b %Y") +
geom_line() +
geom_smooth() +
# Create a legend
scale_fill_manual(name = "",
values = c("School holidays" = "red")) +
ggtitle("Airbnb Geneva Occupancy over Time") +
ylab("Occupied (%)") +
xlab("Date")
occupancy_time_plot
# Table average monthly price
mean_of_month_occupancy <- calendar %>%
mutate(month_name = month(date, label = TRUE, abbr = TRUE)) %>%
group_by(month_name) %>%
summarise(mean_occupancy = mean(occupancy*100, na.rm = TRUE))
# Calculate income
income_df <- calendar %>%
group_by(listing_id) %>%
summarise(
avg_price = round(mean(price_swiss_franc, na.rm = TRUE)),
avg_occupied = round(mean(occupancy*100, na.rm = TRUE), 2),
annual_income = round((avg_occupied/100) * avg_price * 365)
) %>%
arrange(desc(annual_income))
Our visualization journey into Airbnb Occupancy over Time uncovers intricate patterns of how accommodations are filled throughout the year, with keen attention to public holidays as potential influencers of seasonal demand. Starting in January with a mean occupancy rate of about 59.7 %, one see fluctuations might linked to public holidays. However, as the year unfolds from January to mid-April, a consistent decline is observed, reaching its lowest in March at 46 %, regardless of any intervening holidays. This trend is interrupted by a striking rise in late April, registering an occupancy of 79.5 %, even before major public holidays make their mark. The subsequent months of May and June bring a gentle undulating decline, only to rise again in July to 70 %, potentially synchronized with summer breaks. Yet, post mid-August, a gradual descent is again evident.
While certain public holidays do overlap with these occupancy peaks, others align with dips, suggesting that while holidays influence booking patterns, they aren’t always the predominant factors. More notably, the cyclical nature of the data—where an early-year trough is followed by an April peak, a mid-year wane, and a summer spike.
Building on the broader trends in occupancy, it’s enlightening to dive into the financial implications for individual Airbnb listings. Spanning the price spectrum, listings vary considerably, ranging from 17 CHF to a towering 24,705 CHF. The top-tier listings by annual income showcase a striking range in both pricing and occupancy. For instance, our standout listing, with an astonishing average price of 24,705 CHF and an occupancy rate of 83 %, achieves an annual income that breezes past 7,453,721 CHF, exceeding the market’s average of 30,822 CHF. Notably, some listings have carved out a niche for themselves with full occupancy, irrespective of their pricing range. In contrast, others, despite premium pricing, struggle with consistent occupancy, underscoring the delicate balance between price, value, and demand.
On the other side of the coin, the lowest end of our earnings spectrum starkly contrasts the top, revealing listings with zero earnings, irrespective of their set price points. Puzzlingly, such listings command prices as high as 704 CHF or as modest as 69 CHF, yet they grapple with zero occupancy. The median income for these listings settles at 20,320 CHF, and the lowest dips down to 0 CHF.
# Interavtive table
datatable(
income_df,
options = list(pageLength = 5),
caption = "Annual Income per Listing ID",
colnames = c(
"Listing ID",
"Average price (CHF)",
"Average occupancy (%)",
"Annual income (CHF)"
)
)
Now, set against this backdrop, a pertinent question arises: How do these individual prices mesh with the average listing price across days or weeks? Are top earners an average reflection, or mere outliers?
# Calculate average daily price
mean_of_day <- calendar %>%
group_by(date) %>%
summarise(mean_price = mean(price_swiss_franc, na.rm = TRUE))
# Plot
plot_average_listing_price_across_days<- ggplot(mean_of_day, aes(x = date, y = mean_price)) +
geom_point(na.rm=TRUE, alpha=0.5, color = "#007A87") +
geom_smooth(color = "#FF5A5F", method = "loess", se = FALSE) +
ggtitle("Average Listing Price across Days") +
labs(x = "Month", y = "Average price (CHF)") +
theme(
plot.title = element_text(face = "bold")
) +
geom_smooth(color = "#FF5A5F", method = "loess", se = TRUE)
# Table average monthly price
mean_of_month <- calendar %>%
mutate(month_name = month(date, label = TRUE, abbr = TRUE)) %>%
group_by(month_name) %>%
summarise(mean_price = mean(price_swiss_franc, na.rm = TRUE))
plot_average_listing_price_across_days
# overview
psych::describe(calendar$price_swiss_franc)
# number of over 1'000 chf
nu_over_1000 <- length(calendar$price_swiss_franc[calendar$price_swiss_franc>1000])
# number of under 1'000 chf
nu_under_1000 <- length(calendar$price_swiss_franc[calendar$price_swiss_franc<=1000])
# number of under 1 chf and equal to 0
nu_between_01 <- length(calendar$price_swiss_franc[calendar$price_swiss_franc==0 & calendar$price_swiss_franc<1])
nu_tot <- length(calendar$price_swiss_franc)
The analysis of the average listing prices across
months reveals that the price starts at 162 CHF in January and
drops to 157CHF by February. This is followed by a slight increase from
March, peaking at 171 CHF in May, before holding steady at 175 CHF in
June and July. Notably, the data points in January and July are
significantly farther from the red trendline, indicating potential
deviations. This might suggest that the top earners might not
necessarily represent the average. This assertion is further
substantiated by the fact that out of 837951 observations, only 7591 are
priced over 1,000 CHF, signifying that they are exceptions rather than
the norm. A further examination of the
Price difference over weekdays
could provide insights into
weekly fluctuations within the months.
# Mean price per month
price_week_by_month <- calendar %>%
group_by(dayweek, month_name) %>% # grouped for weekday/month
summarise(mean_price = mean(price_swiss_franc, na.rm = TRUE))
# Average Price by Weekday and Month
day_order <-
c('Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday')
plot_price_week_by_month <-
ggplot(
price_week_by_month,
aes(
x = dayweek,
y = mean_price,
group = month_name,
color = month_name
)
) +
geom_line() +
geom_point() +
xlab("Day of the Week") +
ylab("Average price (CHF)") +
ggtitle("Average Price by Weekday and Month") +
scale_color_brewer(palette = "Set1") +
scale_x_discrete(
limits = day_order,
labels = c("Mon", "Tues", "Weds", "Thurs", "Fri", "Sat", "Sun")
) +
labs(color = "Month", fill = "Month")
plot_price_week_by_month
A closer look at the average prices by weekday and month reveals subtle weekly price shifts throughout the year. Fridays consistently register slightly higher prices, peaking at 177 CHF in June. Saturdays begin at 164 CHF in January and peak at 177 CHF in July. In contrast, Sundays start at 162 CHF and cap at 175 CHF by July. Mid-week days, such as Tuesdays and Wednesdays, initiate at the low 160s in January and ascend to 175 CHF by mid-year. This trend suggests prices generally increase towards mid-year, with June frequently witnessing the highest averages. Importantly, the evident premium on Fridays and slight uptick on Saturdays could imply popular check-in days or weekend getaways, whereas Sundays, not seeing as significant a rise, might not be as preferred for check-ins or check-outs.
# overview
psych::describe(listings$price_swiss_franc)
# total number of listings
nul_tot <-
length(listings$price_swiss_franc)
# number of over 1'000 chf
nul_over_1000 <-
length(listings$price_swiss_franc[listings$price_swiss_franc > 1000])
# number of under 1'000 chf
nul_under_1000 <-
length(listings$price_swiss_franc[listings$price_swiss_franc <= 1000])
# number of under 1 chf and equal to 0
nul_0 <-
length(listings$price_swiss_franc[listings$price_swiss_franc == 0 &
listings$price_swiss_franc < 1])
We now explore the pricing dynamics further with the
listings_short
data set. Geneva’s accommodation market
primarily consists of listings below or equal to 1,000 CHF, with a
staggering 6873 properties falling into this category. Only 59 listings
exceed this mark, underscoring the limited presence of luxury
accommodations. The median price of 112 CHF further reaffirms the
dominance of moderate pricing, whereas the presence of extreme outliers,
like a listing at 79,393.6 CHF, skews the average to 141.4 CHF. These
findings resonate with the earlier observation that top-priced listings
don’t truly depict the average market scenario in Geneva.
# Remove outliers and plot the underlying distribution for a more comprehensive overview of the listing prices
# Generate the distribution of listing prices
listings_price_dist <- listings %>%
filter(price_swiss_franc <= 1000 & 0 < price_swiss_franc)
# Create the plot for the filtered listing prices
plot_listings_price_dist <-
ggplot(listings_price_dist, aes(x = price_swiss_franc)) + geom_histogram() +
xlab("Price (CHF)") +
ylab("Count") +
ggtitle("Listings Price Distribution ≤ 1000")
# Display the plot
plot_listings_price_dist
# additional plot over 1000
# Remove outliers and plot the underlying distribution for a more comprehensive overview of the listing prices
# Generate the distribution of listing prices
listings_price_dist <- listings %>%
filter(price_swiss_franc > 1000)
# Create the plot for the filtered listing prices
plot_listings_price_dist <-
ggplot(listings_price_dist, aes(x = price_swiss_franc)) + geom_histogram() +
xlab("Price (CHF)") +
ylab("Count") +
ggtitle("Listings Price Distribution")
# Display the plot
# plot_listings_price_dist
Moreover, the neighborhood-based analysis of the listings data set showcases a compelling distribution (Table: Listings per Neighbourhood). “Commune de Genève” clearly leads with a massive 5,051 listings, eclipsing the subsequent neighborhood, “Carouge,” which boasts 218 listings. However, a stark drop is observed in neighborhoods like “Gy,” “Jussy,” and “Perly-Certoux,” which house minimal listings, some even only 2.
# listings per neighbourhood
num_listings_neighbourhood <- data.frame(
listings %>%
group_by(neighbourhood_cleansed) %>%
summarise(count_id = n()) %>%
arrange(desc(count_id))
)
# Interavtive table
datatable(
num_listings_neighbourhood,
options = list(pageLength = 5),
caption = "Listings per Neighbourhood",
colnames = c("Neighbourhood",
"Listings")
)
To make our analysis most relevant for the majority of potential renters or providers, we’ll concentrate on listings priced up to 1,000 CHF.
The data presents a vivid picture of median prices across Geneva’s neighborhoods. “Hermance”, “Céligny”, and “Genthod” lead, with medians of 176 CHF, 175 CHF, and 167 CHF respectively. It’s worth noting that “Hermance” has an outlier at 396 CHF influencing its average, given its 19 listings. Similarly, “Vandoeuvres” stands out with an outlier at 868 CHF, but a median of 140 CHF.
The “Commune de Genève”, home to a substantial 5,051 listings, displays a median of 103 CHF, with prices varying from 13 CHF to 880 CHF. Yet, a significant chunk of its listings fall within an interquartile range (IQR) from 78 CHF to 140 CHF, indicating a predominance of moderately priced accommodations.
On the other end, neighborhoods like “Jussy” and “Gy” offer affordable medians but house few listings. Meanwhile, “Vernier” and “Grand-Saconnex” present more budget-friendly options with medians of 66 CHF and 87 CHF, respectively, backed by a considerable number of listings.
# Remove outliers, get median
sorted_neighbourhoods <- listings %>%
filter(price_swiss_franc <= 1000 & 0 < price_swiss_franc) %>%
group_by(neighbourhood_cleansed) %>%
summarize(median_price_swiss_franc = median(price_swiss_franc)) %>%
arrange(desc(median_price_swiss_franc))
sorted_neighbourhood_names <-
sorted_neighbourhoods$neighbourhood_cleansed
# boxplot
# palette for x axes
palette <- viridis(length(sorted_neighbourhood_names))
my_colors <-
scale_color_manual(values = setNames(palette, sorted_neighbourhood_names))
price_neighbourhood_plot <- ggplot(
data = filter(listings, price_swiss_franc <= 1000 &
0 < price_swiss_franc),
aes(
x = factor(neighbourhood_cleansed, levels = sorted_neighbourhood_names),
y = price_swiss_franc,
color = neighbourhood_cleansed
)
) +
geom_boxplot() +
geom_jitter(alpha = 0.5, position = position_jitter(width = 0.1), size = 0.5) +
my_colors +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
xlab("Neighbourhood") +
ylab("Price (CHF)") +
ggtitle("Price by Neighbourhoods")
price_neighbourhood_plot_def <-
price_neighbourhood_plot + theme(axis.text.x = element_text(
angle = 45,
hjust = 1,
color = palette
))
print(price_neighbourhood_plot_def)
Shifting our focus to the types of properties, the diversity in Geneva’s accommodations becomes evident. Unique stays like the “Houseboat” top the pricing at 431 CHF, but with a mere 3 listings. Similarly, the “Private room in serviced apartment”, despite its median price of 309 CHF and outliers around 865 CHF, has only 9 listings. “Entire villa” and “Room in aparthotel” are also limited in number, emphasizing their niche nature.
Contrastingly, the prevalent “Entire rental unit” boasts 3,975 listings and a median price of 108 CHF. Its prices span from a modest 13 CHF up to 917 CHF, but most hover within the IQR of 13 CHF to 149 CHF. This points to a preference for rental units, which seemingly offer a good mix of affordability and privacy.
# listings per property type
num_listings_property_type <- data.frame(
listings %>%
group_by(property_type) %>%
summarise(count_id = n()) %>%
arrange(desc(count_id))
)
# Remove outliers (filtered data (0-1000 chf)), get median
sorted_price_swiss_franc <- listings %>%
filter(price_swiss_franc <= 1000 & 0 < price_swiss_franc) %>%
group_by(property_type) %>%
summarise(median_price_swiss_franc = median(price_swiss_franc, na.rm = TRUE)) %>%
arrange(desc(median_price_swiss_franc))
sorted_property_type_names <-
sorted_price_swiss_franc$property_type
# boxplot
# palette for x axes
palette <- viridis(length(sorted_property_type_names))
my_colors <-
scale_color_manual(values = setNames(palette, sorted_property_type_names))
price_property_type <-
ggplot(
data = filter(listings, price_swiss_franc <= 1000 &
0 < price_swiss_franc),
aes(
x = factor(property_type, levels = sorted_property_type_names),
y = price_swiss_franc,
color = property_type
)
) +
geom_boxplot() +
geom_jitter(alpha = 0.5, position = position_jitter(width = 0.1), size = 0.5) +
my_colors +
scale_x_discrete(limits = sorted_price_swiss_franc$property_type) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
xlab("Property type") +
ylab("Price (CHF)") +
ggtitle("Price by Property Type")
price_property_type_def <-
price_property_type + theme(axis.text.x = element_text(
angle = 45,
hjust = 1,
color = palette
))
print(price_property_type_def)
Diving deeper into room type pricing, we discern a definitive trend. The “Hotel room” stands out, though its mere 17 listings command a median price of 193 CHF. “Entire home/apt” is clearly the crowd favorite with 4,908 listings, a median price of 126 CHF, and most offerings falling between 100-179 CHF. But it’s worth noting the extensive range from 15 CHF to 1,000 CHF, highlighting varied offerings with several high-priced outliers. Conversely, “Private room” provides a pocket-friendly median of 75 CHF but isn’t without its luxury outliers. Finally, the “Shared room” category, while being the most economical at 56 CHF, is sparingly opted with just 40 listings.
Overall, the data reflects a marked preference for entire homes or apartments, underscoring the prominence of privacy for visitors or renters in Geneva.
# listings per room type
num_listings_room_type <- data.frame(
listings %>%
group_by(room_type) %>%
summarise(count_id = n()) %>%
arrange(desc(count_id))
)
# Remove outliers (filtered data (0-1000 chf)), get median
sorted_price <- listings %>%
filter(price <= 1000 & price > 0) %>%
group_by(room_type) %>%
summarise(median_price = median(price, na.rm = TRUE)) %>%
arrange(desc(median_price))
sorted_room_types <- sorted_price$room_type
# boxplot
# palette for x axes
palette <- viridis(length(sorted_room_types))
my_colors <-
scale_color_manual(values = setNames(palette, sorted_room_types))
price_room_type <-
ggplot(data = filter(listings, price <= 1000 & price > 0),
aes(
x = factor(room_type, levels = sorted_room_types),
y = price,
color = room_type
)) +
geom_violin(draw_quantiles = c(0.25, 0.5, 0.75)) +
my_colors +
geom_jitter(alpha = 0.1, position = position_jitter(width = 0.1), size = 0.5, color = "darkgrey") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none") +
xlab("Room type") +
ylab("Price (CHF)") +
ggtitle("Price by Room Type") +
coord_flip()
price_room_type_def <-
price_room_type + theme(axis.text.y = element_text(color = palette))
print(price_room_type_def)
Exploring the amenities based on pricing, we glean insights from a consumer survey by Airbnb, spotlighting 10 crucial amenities. In listings under 1000 CHF, WiFi is dominant at 94%. Other prevalent amenities are washers (75 %) and refrigerators (64.47%). However, pools and air conditioners are sparse, with 2.44 % and 0.04 % listings offering them.
For accommodations above 1,000 CHF, there’s a marginal drop in WiFi and shower gel/shampoo inclusions at 91.53 % and 54.24 %. Still, amenities like free parking, pools, and pet allowances jump to around 45 %. Interestingly, irrespective of the price range, air conditioning is virtually absent in Geneva rentals. This implies that while luxury spaces lean towards amenities like pools, essentials like WiFi are consistent must-haves in both categories.
# Create a function to calculate percentages
calculate_percentages <- function(data) {
sums_of_columns <-
colSums(data[, c(
"showergel_or_shampoo",
"wifi",
"freeparking",
"pool",
"dishwasher",
"washer",
"selfcheckin",
"petsallowed",
"refrigerator",
"airconditioner"
)])
total_rows <- nrow(data)
percentage_columns <- sums_of_columns / total_rows * 100
return(percentage_columns)
}
# Calculate percentages for listings under 1000
listings_under_1000 <- listings %>%
filter(price_swiss_franc >= 0 & price_swiss_franc <= 1000)
percentages_under_1000 <- calculate_percentages(listings_under_1000)
# Calculate percentages for listings over 1000
listings_over_1000 <- listings %>%
filter(price_swiss_franc > 1000)
percentages_over_1000 <- calculate_percentages(listings_over_1000)
# Create a data frame for ggplot
d.amenties <- data.frame(
category = rep(
c(
"showergel_or_shampoo",
"wifi",
"freeparking",
"pool",
"dishwasher",
"washer",
"selfcheckin",
"petsallowed",
"refrigerator",
"airconditioner"
), 2
),
price_category = c(rep("< 1000", 10), rep("> 1000", 10)),
percentage = c(percentages_under_1000, percentages_over_1000)
)
# Plotting
amenties_percent_comparison <-
ggplot(d.amenties, aes(x = category, y = percentage, color = price_category, group = price_category)) +
geom_line(aes(linetype = price_category)) +
geom_point() +
labs(
title = "Amenities Percentage by Price Category",
x = "Amenities",
y = "Percentage (%)"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_color_manual(
name = "Price Category",
values = c("< 1000" = "blue", "> 1000" = "red"),
labels = c("< 1000" = "Under 1000", "> 1000" = "Over 1000")
) +
scale_linetype_manual(
name = "Price Category", # Verwenden Sie denselben Namen wie in scale_color_manual
values = c("< 1000" = "solid", "> 1000" = "dashed"),
labels = c("< 1000" = "Under 1000", "> 1000" = "Over 1000")
)
amenties_percent_comparison
In the above 1,000 CHF category, accommodations with pools fetch the top median price at 3,212 CHF, underscoring pools as a luxury in Geneva. Refrigerators and self-check-ins are next, priced at 2,328 CHF, while highly prevalent amenities like WiFi command a lesser 1,970 CHF. Pet-friendly spaces, though luxurious, have a more competitive median of 1,428 CHF.
In the sub-1,000 CHF bracket, amenity-driven price differences are subtler. Listings with pools come at 139 CHF. Other amenities like dishwashers and free parking range around 110-114 CHF. Interestingly, the rare air conditioning is priced lowest at 61.6 CHF, reflecting Geneva’s cool climate. Overall, amenities impact pricing differently across price categories.
# create dataframe
bplot_df <- listings %>%
select(c("price_swiss_franc", "showergel_or_shampoo", "wifi", "pool", "freeparking","dishwasher","washer","selfcheckin","petsallowed","refrigerator","airconditioner"))
# change to long format
new_listings_bplot <- bplot_df %>%
pivot_longer(cols= -price_swiss_franc, names_to = "Category", values_to = "Value")
# Filter rows with true
new_listings_bplot_t <- new_listings_bplot %>%
filter(Value == TRUE)
# subset data into under and over 1000
under_1000_data <- new_listings_bplot_t %>% filter(price_swiss_franc <= 1000)
over_1000_data <- new_listings_bplot_t %>% filter(price_swiss_franc > 1000)
# Calculate median order for "Under 1000"
order_under_1000 <- under_1000_data %>%
group_by(Category) %>%
summarize(median_price = median(price_swiss_franc, na.rm = TRUE)) %>%
arrange(-median_price) %>%
pull(Category)
under_1000_data$Category <- factor(under_1000_data$Category, levels = order_under_1000)
# Calculate median order for "Over 1000"
order_over_1000 <- over_1000_data %>%
group_by(Category) %>%
summarize(median_price = median(price_swiss_franc, na.rm = TRUE)) %>%
arrange(-median_price) %>%
pull(Category)
over_1000_data$Category <- factor(over_1000_data$Category, levels = order_over_1000)
# Plot "Under 1000"
plot_under_1000 <- ggplot(data = under_1000_data, aes(y = price_swiss_franc, x = Category)) +
geom_boxplot(fill = "blue") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Listings under 1000", x = "Amenities", y = "Price (CHF)")
# Plot "Over 1000"
plot_over_1000 <- ggplot(data = over_1000_data, aes(y = price_swiss_franc, x = Category)) +
geom_boxplot(fill = "red") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Listings over 1000", x = "Amenities", y = "Price (CHF)")
# plot side by side
grid.arrange(plot_under_1000, plot_over_1000, ncol=2, top = "Median Prices by Amenities")
For a more detailed exploration across room types, neighborhoods, and other categories, we invite readers to dive into the interactive visualizations in our Shiny app.
Following our visual analysis of the Geneva housing market, we aim to predict future occupancy trends. While our plots highlighted relationships between price and occupancy, other factors like the time of year and public holidays are also crucial. For predicting occupancy, we’ve chosen a logistic regression model, given that occupancy is binary: a property is either occupied or not. Logistic regression is adept at handling such binary outcomes. This approach aligns with the work of Lu [1], who also employed a logistic regression model for predicting occupancy in this data set, albeit with different predictor variables. Importantly, our focus is predicting occupancy based on price, not vice versa. The logical causality is that price, along with other variables, determines occupancy. Our model seeks to understand how price, combined with these factors, influences the likelihood of a listing being occupied.
Our logistic regression model is formulated as follows:
\[ P(\text{occupancy} = 1) = \frac{1}{1 + \exp(-(\beta_0 + \beta_1 \times X_1 + \beta_2 \times X_2 + \beta_3 \times X_3 + \beta_4 \times X_4))} \]
Where:
log_CHF
, month
, dayweek
, and
is_holiday
.The price variable (CHF) was subjected to transformation. During the initial visual analysis in the descriptive statistics phase, we observed its distribution and subsequently employed a QQ plot. This revealed a skewed distribution as seen in the graphic below. To address this, we applied a logarithmic transformation, turning it into a logarithmic scale. The transformation made the price variable less skewed and more suitable for modeling, ensuring that the assumptions of the logistic regression are better met.
# Variable inclusion
calendar_df_cor <- calendar %>%
select(price_swiss_franc,
listing_id,
date,
occupancy,
month_name,
dayweek,
is_holiday) %>%
drop_na() %>%
mutate(dayweek = as.factor(dayweek),
month_name = as.factor(month_name))
# Convert month_name to character and back to factor
calendar_df_cor$month_name <-
as.factor(as.character(calendar_df_cor$month_name))
# Set contrasts for the month_name factor to treatment contrasts to actually get the months in the LM
contrasts(calendar_df_cor$month_name) <-
contr.treatment(levels(calendar_df_cor$month_name))
# Set reference level of month to january
calendar_df_cor$month <-
relevel(calendar_df_cor$month_name, ref = "Jan")
# exclude month_name
calendar_df_cor <- calendar_df_cor %>%
select(-month_name)
calendar_df_cor$log_CHF <- log(calendar_df_cor$price_swiss_franc)
# Data for plotting
calendar_qqplot <- data.frame(
Original = calendar_df_cor$price_swiss_franc,
LogTransformed = calendar_df_cor$log_CHF
)
# Original Data
plot1 <- ggplot(calendar_qqplot, aes(sample = Original)) +
stat_qq() +
stat_qq_line(color="red", linetype="dashed") +
labs(title = "Q-Q Plot of CHF", x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal()
# Log-transformed Data
plot2 <- ggplot(calendar_qqplot, aes(sample = LogTransformed)) +
stat_qq() +
stat_qq_line(color="blue", linetype="dashed") +
labs(title = "Q-Q Plot of Log-transformed CHF", x = "Theoretical Quantiles", y = "Sample Quantiles") +
theme_minimal()
# Arrange the plots side by side
grid.arrange(plot1, plot2, ncol = 2)
# Some further diagnostics
# psych::pairs.panels(calendar_df_cor)
# Point-biserial correlation
cor.test(calendar_df_cor$occupancy,
calendar_df_cor$log_CHF,
method = "pearson")
# Cramérs V
assoc_stats_month <-
assocstats(table(calendar_df_cor$occupancy, calendar_df_cor$month))
assoc_stats_dayweek <-
assocstats(table(calendar_df_cor$occupancy, calendar_df_cor$dayweek))
assoc_stats_holiday <-
assocstats(table(calendar_df_cor$occupancy, calendar_df_cor$is_holiday))
assoc_stats_month$cramer
assoc_stats_dayweek$cramer
assoc_stats_holiday$cramer
# custom theme
publication_layout = theme_bw() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
axis.line = element_line(),
text = element_text(family = 'Times New Roman')
)
# Logistic regression Model
logistic_model_0 <-
glm(
occupancy ~ log_CHF + month + dayweek + is_holiday,
family = binomial(),
data = calendar_df_cor
)
# correct nameing
rename_coefs <- c(
"Log Chf" = "log_CHF",
"February" = "monthFeb",
"March" = "monthMar",
"April" = "monthApr",
"May" = "monthMay",
"June" = "monthJun",
"July" = "monthJul",
"Monday" = "dayweekMonday",
"Saturday" = "dayweekSaturday",
"Sunday" = "dayweekSunday",
"Thursday" = "dayweekThursday",
"Tuesday" = "dayweekTuesday",
"Wednesday" = "dayweekWednesday",
"Holiday" = "is_holiday"
)
# summary(logistic_model_0)
# summ(logistic_model_0, exp = TRUE, scale = TRUE)
# prepare nice summary table
export_summs(
logistic_model_0,
coefs = rename_coefs,
exp = TRUE,
scale = TRUE,
error_format = "({conf.low}-{conf.high})",
error_pos = "right",
model.names = "Adjusted Odds Ratios"
)
Adjusted Odds Ratios | ||
---|---|---|
Log Chf | 0.84 *** | (0.84-0.85) |
February | 0.78 *** | (0.76-0.79) |
March | 0.71 *** | (0.69-0.72) |
April | 1.17 *** | (1.15-1.19) |
May | 1.05 *** | (1.03-1.07) |
June | 0.83 *** | (0.82-0.85) |
July | 1.15 *** | (1.13-1.17) |
Monday | 0.96 *** | (0.94-0.97) |
Saturday | 0.95 *** | (0.93-0.96) |
Sunday | 0.93 *** | (0.92-0.95) |
Thursday | 1.02 ** | (1.01-1.04) |
Tuesday | 0.97 ** | (0.96-0.99) |
Wednesday | 0.98 * | (0.96-1.00) |
Holiday | 0.89 *** | (0.88-0.90) |
N | 837621 | |
AIC | 1123199.25 | |
BIC | 1123373.82 | |
Pseudo R2 | 0.02 | |
All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. |
The logistic regression model, with a vast sample of 837,621 observations, predicts occupancy based on various factors using a logit link function. An intriguing finding is the inverse relationship between the logarithm of the price and occupancy; an increase in the logarithm of the price leads to an adjusted odds ratio of 0.84 (aOR), suggesting that higher-priced properties are less likely to be occupied. Moreover, occupancy patterns exhibit noticeable seasonal variations. For instance, April has increased odds of 1.17 for occupancy, while months like February and June show decreased odds compared to January, highlighting the ebb and flow of demands during different times of the year. The weekdays play a role too; for instance, Sundays tend to be slightly less popular, with a decline in the odds of occupancy, whereas Thursdays witness a slight surge compared to Friday. Another notable insight is the reduced likelihood (aOR of 0.89) of properties being occupied during public holidays. Importantly, every predictor in this model is statistically significant, asserting their collective contribution to understanding occupancy trends.
plot_summs(
logistic_model_0,
coefs = rename_coefs,
scale = TRUE,
# Standardize coefficients for comparison
plot.distributions = FALSE,
exp = TRUE
# Exponentiate coefficients to show adjusted Odds Ratios
) + publication_layout +
scale_x_continuous(breaks = c(0.8, 0.9, 1.0, 1.1, 1.2, 1.3),
limits = c(0.8, 1.3)) +
coord_trans(x = "log10") +
# Apply a pre-prepareddee layout
labs(x = "\nAdjusted Odds Ratios\n", y = NULL)
However, it’s crucial to approach these findings with a level of caution. While the model considers several relevant factors, it doesn’t account for external factors like holidays in neighboring countries or global events which could influence Geneva’s occupancy rates. Moreover, the model’s pseudo \(R^2\) values are quite low. This underlines that the model explains only a small fraction of the variance in occupancy, suggesting potential gaps. Thus, a more comprehensive model might be beneficial to account for these omitted variables.
Building on our previous analyses of different categories like room type, neighborhood, property type, and amenities, the geospatial plots provide an added dimension. It visually interprets the data, laying bare the exact locations of listings and offering a clear spatial insight into where they are predominantly situated. Following this, we observe that the density of Airbnb listings is notably higher in the city center compared to the rest of the canton. As one moves outward from the city’s heart, the availability of Airbnb listings progressively diminishes. The city center showcases roughly 5,000 Airbnb listings, in contrast to the surrounding municipalities which typically range between 100 to 200 listings each.
Individual municipalities were also compared using the
Price by Neighborhoods
plot. It was revealed that the
median prices in municipalities located farther from the city center
tend to be more affordable. Examples of such municipalities include
“Jussy”, “Gy”, “Bardonnex” or Chancy. The location significantly
influences both the price and the number of Airbnb listings within the
municipalities. Additionally, it is evident that municipalities situated
directly along the shores of Lake Geneva have the highest median prices
when compared to all other municipalities. These include municipalities
like “Cologny”, “Bellevue”, “Genthod” and “Céligny”.
# Functions
# Function to read shapefiles
read_shapefile <- function(filepath) {
return(st_read(filepath))
}
# Function to filter data by KANTONSNUMMER
filter_by_kantonsnummer <- function(shape_data, kantonsnummer) {
return(shape_data[shape_data$KANTONSNUMMER == kantonsnummer,])
}
# Function to transform shape data to longitude and latitude
transform_shape_data <- function(shape_data) {
transformed_data <- data.frame()
for (i in seq_along(shape_data$NAME)) {
temp_data <- shape_data %>%
filter(NAME == shape_data$NAME[i]) %>%
pull(Shape) %>%
st_transform(., "+proj=longlat") %>%
st_coordinates() %>%
as.data.frame() %>%
mutate(municipality = shape_data$NAME[i])
transformed_data <- rbind(transformed_data, temp_data)
}
return(transformed_data)
}
# Main code
base_path <- "../Data/swissBOUNDARIES3D_1_4_LV95_LN02.gdb/"
# Read shapefile data
shapefile_path <-
file.path(base_path, "a0000000a.gdbtable")
shapefile_data <- read_shapefile(shapefile_path)
# Filter shapefile data to keep only data for Genf (KANTONSNUMMER 25)
genf_data <- filter_by_kantonsnummer(shapefile_data, 25)
# Transform shape data to longitude and latitude
transformed_genf <- transform_shape_data(genf_data)
# Plot
# Google API key registration and city location
api_key <- "AIzaSyB1YEwTEBaMAnHj8nMmuLnvIFwKcjxO9QQ"
register_google(key = api_key)
city <- "Geneva"
city_location <- geocode(city)
city_map <-
get_googlemap(
center = c(lon = city_location$lon, lat = city_location$lat),
zoom = 11,
key = api_key
)
# Create ggplot
p <- ggmap(city_map) +
geom_point(
data = listings,
aes(x = longitude, y = latitude),
color = "blue",
alpha = 0.3
) +
ggtitle("Listings Location") +
geom_polygon(
data = transformed_genf,
aes(x = X, y = Y, group = municipality, fill = municipality),
color = "black",
alpha = 0.3,
size = 0.5
) +
scale_fill_viridis(discrete = TRUE) +
labs(x = "Longitude", y = "Latitude") +
theme(legend.position = "none")
print(p)
# adjust the plot for density plot
listings_names <- unique(listings$neighbourhood_cleansed)
new_genf_names <- unique(transformed_genf$municipality)
diff_names <- setdiff(listings_names, new_genf_names)
sort(listings_names, decreasing = F)
sort(new_genf_names, decreasing = F)
# change the names of the neighborhoods to the same as in new_genf data set
listings_new <-
listings %>% mutate(
neighbourhood_cleansed = ifelse(
neighbourhood_cleansed == "Commune de Genève",
"Genève",
neighbourhood_cleansed
)
) %>% mutate(
neighbourhood_cleansed = ifelse(
neighbourhood_cleansed == "Grand-Saconnex",
"Le Grand-Saconnex",
neighbourhood_cleansed
)
) %>% mutate(
neighbourhood_cleansed = ifelse(
neighbourhood_cleansed == "Carouge",
"Carouge (GE)",
neighbourhood_cleansed
)
) %>% mutate(
neighbourhood_cleansed = ifelse(
neighbourhood_cleansed == "Corsier",
"Corsier (GE)",
neighbourhood_cleansed
)
)
# check again difference
diff_names <-
setdiff(listings_new$neighbourhood_cleansed, new_genf_names)
diff_names
add_to_new_genf <-
listings_new %>% group_by(neighbourhood_cleansed) %>% summarize(count = n())
# add the count to new_genf with join
new_genf <-
left_join(transformed_genf,
add_to_new_genf,
by = c("municipality" = "neighbourhood_cleansed"))
# get the center of the municpal for the plot to add
center <- new_genf %>%
group_by(municipality) %>%
mutate(center_x = (min(X) + max(X)) / 2 ,
center_y = (min(Y) + max(Y)) / 2)
unique_center <- center %>%
group_by(municipality) %>%
summarize(
municipality = unique(municipality),
center_x = unique(center_x),
center_y = unique(center_y)
)
# get the map without text on the map
api_key <- "AIzaSyB1YEwTEBaMAnHj8nMmuLnvIFwKcjxO9QQ"
register_google(key = "AIzaSyB1YEwTEBaMAnHj8nMmuLnvIFwKcjxO9QQ")
city <- "Geneva"
city_location <- geocode(city)
# orginal zoom 11 and without size
city_map <-
get_googlemap(
center = c(lon = city_location$lon, lat = city_location$lat),
zoom = 11,
key = api_key,
style = "feature:all|element:labels|visibility:off"
)
#plot the map with the names of the municpals
p <- ggmap(city_map) +
geom_polygon(
data = new_genf,
aes(
x = X,
y = Y,
group = municipality,
fill = count
),
color = "black",
size = 1,
alpha = 0.6
) +
geom_text(
data = unique_center,
aes(
x = center_x,
y = center_y,
group = municipality,
label = municipality
),
color = "navyblue",
hjust = 0.5,
vjust = 0.5,
size = 3
) +
ggtitle("Listings Density")+
theme_void() +
scale_fill_viridis(
trans = "log",
breaks = c(0, 5, 10, 20, 30, 50, 100, 150, 200, 5000),
name = "Listings per Municipal",
option = "viridis",
discrete = F,
guide = guide_legend(
keyheight = unit(10, units = "mm"),
keywidth = unit(12, units = "mm"),
label.position = "right",
title.position = "top"
)
)
print(p)
The room type was also briefly inspected geographically, and it was particularly noticed that many hotels are only available in the city center, while some are located near the airport.It was suspected that villas might be located near the lake. However, this is not the case, as they are situated just outside the city.
Our exploration into Airbnb’s occupancy and listing dynamics in Geneva reveals multifaceted patterns driven by both predictable factors like public holidays and factors such as location-specific demand. The geospatial analysis offers a vivid picture of how listings are spread across the canton. A noticeable concentration of Airbnb accommodations in the city center underlines the area’s popularity, while locations along Lake Geneva command top prices, testifying to the lake’s appeal. Interestingly, outlying communities are more affordable, underlining the economic disparity between the city center and its surrounding areas. The diversity in room types and amenities further enriches Geneva’s accommodation market, with entire homes or apartments emerging as a favorite. However, despite capturing significant coefficients, our model’s limited explanatory power suggests the existence of external factors on occupancy, therefore further analysis should be done. In summary, Geneva’s Airbnb market is multifaceted and influenced by various factors ranging from seasonal from seasonality to where listings are located.
calendar_codebook <- codebook::codebook_table(calendar)
kable(calendar_codebook,
caption = "Codebook for Calendar Data",
format = "html",
booktabs = TRUE) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "500px")
name | data_type | ordered | value_labels | n_missing | complete_rate | n_unique | empty | top_counts | min | median | max | mean | sd | whitespace | hist | label |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
listing_id | numeric | NA | NA | 0 | 1.0000000 | NA | NA | NA | 4.3e+04 | 4.3e+07 | 9.2e+17 | 1.972305e+17 | 3.202923e+17 | NA | ▇▁▁▂▁ | NA |
date | Date | NA | NA | 0 | 1.0000000 | 212 | NA | NA | 2023-01-01 | 2023-05-17 | 2023-07-31 | NA | NA | NA | NA | NA |
available | character | NA | NA | 0 | 1.0000000 | 2 | 0 | NA | 1 | NA | 1 | NA | NA | 0 | NA | NA |
price | numeric | NA | NA | 330 | 0.9996062 | NA | NA | NA | 1.0e+01 | 1.2e+02 | 9.0e+04 | 1.906869e+02 | 1.215533e+03 | NA | ▇▁▁▁▁ | NA |
adjusted_price | character | NA | NA | 0 | 1.0000000 | 1384 | 330 | NA | 0 | NA | 10 | NA | NA | 0 | NA | NA |
minimum_nights | numeric | NA | NA | 1 | 0.9999988 | NA | NA | NA | 1.0e+00 | 2.0e+00 | 1.1e+03 | 8.399278e+00 | 3.825786e+01 | NA | ▇▁▁▁▁ | NA |
maximum_nights | numeric | NA | NA | 1 | 0.9999988 | NA | NA | NA | 1.0e+00 | 1.1e+03 | 1.2e+03 | 6.984890e+02 | 4.808159e+02 | NA | ▃▃▁▁▇ | NA |
is_holiday | numeric | NA | NA | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 0.0e+00 | 1.0e+00 | 3.232743e-01 | 4.677267e-01 | NA | ▇▁▁▁▃ | NA |
price_swiss_franc | numeric | NA | NA | 330 | 0.9996062 | NA | NA | NA | 8.8e+00 | 1.0e+02 | 7.9e+04 | 1.678044e+02 | 1.069669e+03 | NA | ▇▁▁▁▁ | NA |
mean_occupancy | numeric | NA | NA | 0 | 1.0000000 | NA | NA | NA | 4.6e-01 | 5.9e-01 | 8.0e-01 | 5.893280e-01 | 5.847640e-02 | NA | ▂▇▇▁▁ | NA |
occupancy | numeric | NA | NA | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 1.0e+00 | 1.0e+00 | 5.893280e-01 | 4.919561e-01 | NA | ▆▁▁▁▇ | NA |
month_name | factor | TRUE |
|
0 | 1.0000000 | 7 | NA | Jul: 214861, Jun: 140723, May: 140058, Apr: 135540 | NA | NA | NA | NA | NA | NA | NA | NA |
dayweek | character | NA | NA | 0 | 1.0000000 | 7 | 0 | NA | 6 | NA | 9 | NA | NA | 0 | NA | NA |
listings_codebook <- codebook::codebook_table(listings)
kable(listings_codebook,
caption = "Codebook for Listings Data",
format = "html",
booktabs = TRUE) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed")) %>%
scroll_box(width = "100%", height = "500px")
name | data_type | n_missing | complete_rate | n_unique | empty | count | min | median | max | mean | sd | whitespace | hist | label |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | numeric | 0 | 1.0000000 | NA | NA | NA | 4.3e+04 | 4.4e+07 | 9.2e+17 | 2.253646e+17 | 3.425411e+17 | NA | ▇▁▁▂▂ | NA |
listing_url | character | 0 | 1.0000000 | 2976 | 0 | NA | 34 | NA | 47 | NA | NA | 0 | NA | NA |
scrape_id | numeric | 0 | 1.0000000 | NA | NA | NA | 2.0e+13 | 2.0e+13 | 2.0e+13 | 2.022748e+13 | 4.334162e+09 | NA | ▃▁▁▁▇ | NA |
last_scraped | character | 0 | 1.0000000 | 5 | 0 | NA | 10 | NA | 10 | NA | NA | 0 | NA | NA |
source | character | 0 | 1.0000000 | 2 | 0 | NA | 11 | NA | 15 | NA | NA | 0 | NA | NA |
name | character | 0 | 1.0000000 | 4055 | 0 | NA | 2 | NA | 122 | NA | NA | 0 | NA | NA |
description | character | 0 | 1.0000000 | 3141 | 208 | NA | 0 | NA | 1014 | NA | NA | 0 | NA | NA |
neighborhood_overview | character | 0 | 1.0000000 | 1326 | 3520 | NA | 0 | NA | 1000 | NA | NA | 0 | NA | NA |
picture_url | character | 0 | 1.0000000 | 3117 | 0 | NA | 62 | NA | 126 | NA | NA | 0 | NA | NA |
host_id | numeric | 0 | 1.0000000 | NA | NA | NA | 6.8e+04 | 5.5e+07 | 5.2e+08 | 1.337798e+08 | 1.523743e+08 | NA | ▇▂▁▁▁ | NA |
host_url | character | 0 | 1.0000000 | 1981 | 0 | NA | 39 | NA | 43 | NA | NA | 0 | NA | NA |
host_name | character | 0 | 1.0000000 | 1354 | 0 | NA | 1 | NA | 24 | NA | NA | 0 | NA | NA |
host_since | character | 0 | 1.0000000 | 1534 | 0 | NA | 10 | NA | 10 | NA | NA | 0 | NA | NA |
host_location | character | 0 | 1.0000000 | 195 | 1118 | NA | 0 | NA | 31 | NA | NA | 0 | NA | NA |
host_about | character | 0 | 1.0000000 | 868 | 3381 | NA | 0 | NA | 3006 | NA | NA | 4 | NA | NA |
host_response_time | character | 0 | 1.0000000 | 5 | 0 | NA | 3 | NA | 18 | NA | NA | 0 | NA | NA |
host_response_rate | character | 0 | 1.0000000 | 66 | 0 | NA | 2 | NA | 4 | NA | NA | 0 | NA | NA |
host_acceptance_rate | character | 0 | 1.0000000 | 97 | 0 | NA | 2 | NA | 4 | NA | NA | 0 | NA | NA |
host_is_superhost | character | 0 | 1.0000000 | 3 | 623 | NA | 0 | NA | 1 | NA | NA | 0 | NA | NA |
host_thumbnail_url | character | 0 | 1.0000000 | 2017 | 0 | NA | 55 | NA | 131 | NA | NA | 0 | NA | NA |
host_picture_url | character | 0 | 1.0000000 | 2017 | 0 | NA | 57 | NA | 134 | NA | NA | 0 | NA | NA |
host_neighbourhood | character | 0 | 1.0000000 | 33 | 6843 | NA | 0 | NA | 28 | NA | NA | 0 | NA | NA |
host_listings_count | numeric | 0 | 1.0000000 | NA | NA | NA | 1.0e+00 | 2.0e+00 | 8.0e+02 | 1.377149e+01 | 3.712319e+01 | NA | ▇▁▁▁▁ | NA |
host_total_listings_count | numeric | 0 | 1.0000000 | NA | NA | NA | 1.0e+00 | 2.0e+00 | 8.0e+02 | 2.550880e+01 | 7.453493e+01 | NA | ▇▁▁▁▁ | NA |
host_verifications | character | 0 | 1.0000000 | 6 | 0 | NA | 2 | NA | 32 | NA | NA | 0 | NA | NA |
host_has_profile_pic | character | 0 | 1.0000000 | 2 | 0 | NA | 1 | NA | 1 | NA | NA | 0 | NA | NA |
host_identity_verified | character | 0 | 1.0000000 | 2 | 0 | NA | 1 | NA | 1 | NA | NA | 0 | NA | NA |
neighbourhood | character | 0 | 1.0000000 | 95 | 3520 | NA | 0 | NA | 51 | NA | NA | 0 | NA | NA |
neighbourhood_cleansed | character | 0 | 1.0000000 | 41 | 0 | NA | 2 | NA | 18 | NA | NA | 0 | NA | NA |
neighbourhood_group_cleansed | logical | 6932 | 0.0000000 | NA | NA | : | NA | NA | NA | NaN | NA | NA | NA | NA |
latitude | numeric | 0 | 1.0000000 | NA | NA | NA | 4.6e+01 | 4.6e+01 | 4.6e+01 | 4.620679e+01 | 1.971070e-02 | NA | ▁▇▁▁▁ | NA |
longitude | numeric | 0 | 1.0000000 | NA | NA | NA | 6.0e+00 | 6.1e+00 | 6.3e+00 | 6.143610e+00 | 2.653530e-02 | NA | ▁▁▇▃▁ | NA |
property_type | character | 0 | 1.0000000 | 44 | 0 | NA | 4 | NA | 34 | NA | NA | 0 | NA | NA |
room_type | character | 0 | 1.0000000 | 4 | 0 | NA | 10 | NA | 15 | NA | NA | 0 | NA | NA |
accommodates | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 2.0e+00 | 1.5e+01 | 2.716676e+00 | 1.582236e+00 | NA | ▇▃▁▁▁ | NA |
bathrooms | logical | 6932 | 0.0000000 | NA | NA | : | NA | NA | NA | NaN | NA | NA | NA | NA |
bathrooms_text | character | 0 | 1.0000000 | 24 | 6 | NA | 0 | NA | 17 | NA | NA | 0 | NA | NA |
bedrooms | numeric | 1295 | 0.8131852 | NA | NA | NA | 1.0e+00 | 1.0e+00 | 1.2e+01 | 1.380699e+00 | 7.872543e-01 | NA | ▇▁▁▁▁ | NA |
beds | numeric | 129 | 0.9813907 | NA | NA | NA | 1.0e+00 | 1.0e+00 | 1.2e+01 | 1.625753e+00 | 1.063492e+00 | NA | ▇▁▁▁▁ | NA |
amenities | character | 0 | 1.0000000 | 6605 | 0 | NA | 2 | NA | 1721 | NA | NA | 0 | NA | NA |
price | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 1.1e+02 | 9.0e+04 | 1.932773e+02 | 1.280425e+03 | NA | ▇▁▁▁▁ | NA |
minimum_nights | numeric | 0 | 1.0000000 | NA | NA | NA | 1.0e+00 | 2.0e+00 | 1.1e+03 | 8.280439e+00 | 4.044576e+01 | NA | ▇▁▁▁▁ | NA |
maximum_nights | numeric | 0 | 1.0000000 | NA | NA | NA | 1.0e+00 | 3.6e+02 | 1.2e+03 | 5.534494e+02 | 4.816618e+02 | NA | ▇▅▁▁▇ | NA |
minimum_minimum_nights | numeric | 1 | 0.9998557 | NA | NA | NA | 1.0e+00 | 2.0e+00 | 1.1e+03 | 8.013129e+00 | 4.026809e+01 | NA | ▇▁▁▁▁ | NA |
maximum_minimum_nights | numeric | 1 | 0.9998557 | NA | NA | NA | 1.0e+00 | 3.0e+00 | 1.1e+03 | 8.712018e+00 | 4.060807e+01 | NA | ▇▁▁▁▁ | NA |
minimum_maximum_nights | numeric | 1 | 0.9998557 | NA | NA | NA | 1.0e+00 | 1.1e+03 | 1.2e+03 | 6.831202e+02 | 4.823997e+02 | NA | ▅▃▁▁▇ | NA |
maximum_maximum_nights | numeric | 1 | 0.9998557 | NA | NA | NA | 1.0e+00 | 1.1e+03 | 1.2e+03 | 6.969799e+02 | 4.780791e+02 | NA | ▃▃▁▁▇ | NA |
minimum_nights_avg_ntm | numeric | 1 | 0.9998557 | NA | NA | NA | 1.0e+00 | 2.1e+00 | 1.1e+03 | 8.426244e+00 | 4.041881e+01 | NA | ▇▁▁▁▁ | NA |
maximum_nights_avg_ntm | numeric | 1 | 0.9998557 | NA | NA | NA | 1.0e+00 | 1.1e+03 | 1.2e+03 | 6.928540e+02 | 4.777958e+02 | NA | ▃▃▁▁▇ | NA |
calendar_updated | logical | 6932 | 0.0000000 | NA | NA | : | NA | NA | NA | NaN | NA | NA | NA | NA |
has_availability | character | 0 | 1.0000000 | 2 | 0 | NA | 1 | NA | 1 | NA | NA | 0 | NA | NA |
availability_30 | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 5.0e+00 | 3.0e+01 | 9.664166e+00 | 1.083220e+01 | NA | ▇▂▂▂▂ | NA |
availability_60 | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 1.6e+01 | 6.0e+01 | 2.211671e+01 | 2.212574e+01 | NA | ▇▂▂▂▃ | NA |
availability_90 | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 2.9e+01 | 9.0e+01 | 3.641200e+01 | 3.368848e+01 | NA | ▇▂▂▂▅ | NA |
availability_365 | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 1.2e+02 | 3.6e+02 | 1.539981e+02 | 1.380370e+02 | NA | ▇▂▂▂▅ | NA |
calendar_last_scraped | character | 0 | 1.0000000 | 5 | 0 | NA | 10 | NA | 10 | NA | NA | 0 | NA | NA |
number_of_reviews | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 6.0e+00 | 6.8e+02 | 2.568104e+01 | 5.409235e+01 | NA | ▇▁▁▁▁ | NA |
number_of_reviews_ltm | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 2.0e+00 | 1.6e+02 | 7.507069e+00 | 1.484481e+01 | NA | ▇▁▁▁▁ | NA |
number_of_reviews_l30d | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 0.0e+00 | 1.7e+01 | 6.602712e-01 | 1.599530e+00 | NA | ▇▁▁▁▁ | NA |
first_review | character | 0 | 1.0000000 | 1436 | 1336 | NA | 0 | NA | 10 | NA | NA | 0 | NA | NA |
last_review | character | 0 | 1.0000000 | 852 | 1336 | NA | 0 | NA | 10 | NA | NA | 0 | NA | NA |
review_scores_rating | numeric | 1336 | 0.8072706 | NA | NA | NA | 0.0e+00 | 4.8e+00 | 5.0e+00 | 4.688749e+00 | 5.324905e-01 | NA | ▁▁▁▁▇ | NA |
review_scores_accuracy | numeric | 1363 | 0.8033756 | NA | NA | NA | 1.0e+00 | 4.9e+00 | 5.0e+00 | 4.760332e+00 | 3.978002e-01 | NA | ▁▁▁▁▇ | NA |
review_scores_cleanliness | numeric | 1363 | 0.8033756 | NA | NA | NA | 1.0e+00 | 4.8e+00 | 5.0e+00 | 4.705213e+00 | 4.266744e-01 | NA | ▁▁▁▁▇ | NA |
review_scores_checkin | numeric | 1363 | 0.8033756 | NA | NA | NA | 1.0e+00 | 4.9e+00 | 5.0e+00 | 4.813769e+00 | 3.696558e-01 | NA | ▁▁▁▁▇ | NA |
review_scores_communication | numeric | 1363 | 0.8033756 | NA | NA | NA | 1.0e+00 | 4.9e+00 | 5.0e+00 | 4.801094e+00 | 3.749976e-01 | NA | ▁▁▁▁▇ | NA |
review_scores_location | numeric | 1363 | 0.8033756 | NA | NA | NA | 1.0e+00 | 4.9e+00 | 5.0e+00 | 4.785983e+00 | 3.456745e-01 | NA | ▁▁▁▁▇ | NA |
review_scores_value | numeric | 1363 | 0.8033756 | NA | NA | NA | 1.0e+00 | 4.7e+00 | 5.0e+00 | 4.603264e+00 | 4.533995e-01 | NA | ▁▁▁▁▇ | NA |
license | logical | 6932 | 0.0000000 | NA | NA | : | NA | NA | NA | NaN | NA | NA | NA | NA |
instant_bookable | character | 0 | 1.0000000 | 2 | 0 | NA | 1 | NA | 1 | NA | NA | 0 | NA | NA |
calculated_host_listings_count | numeric | 0 | 1.0000000 | NA | NA | NA | 1.0e+00 | 1.0e+00 | 9.3e+01 | 7.974899e+00 | 1.869773e+01 | NA | ▇▁▁▁▁ | NA |
calculated_host_listings_count_entire_homes | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 1.0e+00 | 8.4e+01 | 6.792700e+00 | 1.730076e+01 | NA | ▇▁▁▁▁ | NA |
calculated_host_listings_count_private_rooms | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 0.0e+00 | 1.6e+01 | 1.088719e+00 | 2.168172e+00 | NA | ▇▁▁▁▁ | NA |
calculated_host_listings_count_shared_rooms | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 0.0e+00 | 2.0e+00 | 1.139640e-02 | 1.201762e-01 | NA | ▇▁▁▁▁ | NA |
reviews_per_month | numeric | 1336 | 0.8072706 | NA | NA | NA | 1.0e-02 | 5.4e-01 | 1.3e+01 | 1.063987e+00 | 1.482624e+00 | NA | ▇▁▁▁▁ | NA |
period | numeric | 0 | 1.0000000 | NA | NA | NA | 1.0e+00 | 2.0e+00 | 3.0e+00 | 2.023803e+00 | 8.197068e-01 | NA | ▇▁▇▁▇ | NA |
price_swiss_franc | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 1.0e+02 | 7.9e+04 | 1.700840e+02 | 1.126774e+03 | NA | ▇▁▁▁▁ | NA |
showergel_or_shampoo | logical | 0 | 1.0000000 | NA | NA | TRU: 3972, FAL: 2960 | NA | NA | NA | 5.729948e-01 | NA | NA | NA | NA |
wifi | logical | 0 | 1.0000000 | NA | NA | TRU: 6516, FAL: 416 | NA | NA | NA | 9.399885e-01 | NA | NA | NA | NA |
freeparking | logical | 0 | 1.0000000 | NA | NA | FAL: 5652, TRU: 1280 | NA | NA | NA | 1.846509e-01 | NA | NA | NA | NA |
pool | logical | 0 | 1.0000000 | NA | NA | FAL: 6740, TRU: 192 | NA | NA | NA | 2.769760e-02 | NA | NA | NA | NA |
dishwasher | logical | 0 | 1.0000000 | NA | NA | FAL: 4864, TRU: 2068 | NA | NA | NA | 2.983266e-01 | NA | NA | NA | NA |
washer | logical | 0 | 1.0000000 | NA | NA | TRU: 5207, FAL: 1725 | NA | NA | NA | 7.511541e-01 | NA | NA | NA | NA |
selfcheckin | logical | 0 | 1.0000000 | NA | NA | FAL: 5037, TRU: 1895 | NA | NA | NA | 2.733699e-01 | NA | NA | NA | NA |
petsallowed | logical | 0 | 1.0000000 | NA | NA | FAL: 5484, TRU: 1448 | NA | NA | NA | 2.088863e-01 | NA | NA | NA | NA |
refrigerator | logical | 0 | 1.0000000 | NA | NA | TRU: 4452, FAL: 2480 | NA | NA | NA | 6.422389e-01 | NA | NA | NA | NA |
airconditioner | logical | 0 | 1.0000000 | NA | NA | FAL: 6929, TRU: 3 | NA | NA | NA | 4.328000e-04 | NA | NA | NA | NA |
row_sums | numeric | 0 | 1.0000000 | NA | NA | NA | 0.0e+00 | 4.0e+00 | 9.0e+00 | 3.899740e+00 | 1.583501e+00 | NA | ▁▆▇▃▁ | NA |
DataExplorer::plot_bar(calendar_short, title = "Calendar")
DataExplorer::plot_bar(listings_short, title = "Listings")
Here we report a printout of all R packages used in the analysis and their versions to facilitate the reproducibility of the analysis and results.
pander(sessionInfo(), compact = TRUE)
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: LC_COLLATE=German_Switzerland.utf8, LC_CTYPE=German_Switzerland.utf8, LC_MONETARY=German_Switzerland.utf8, LC_NUMERIC=C and LC_TIME=en_EU.UTF-8
attached base packages: grid, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: summarytools(v.1.0.1), vcd(v.1.4-11), broom.mixed(v.0.2.9.4), jtools(v.2.2.2), sp(v.2.0-0), spatstat(v.3.0-6), spatstat.linnet(v.3.1-1), spatstat.model(v.3.2-4), rpart(v.4.1.19), spatstat.explore(v.3.2-1), nlme(v.3.1-160), spatstat.random(v.3.1-5), spatstat.geom(v.3.2-4), spatstat.data(v.3.0-1), osmdata(v.0.2.5), sf(v.1.0-14), testthat(v.3.1.6), readxl(v.1.4.2), psych(v.2.3.3), forcats(v.1.0.0), stringr(v.1.5.0), purrr(v.1.0.1), readr(v.2.1.4), tidyr(v.1.3.0), tibble(v.3.2.1), tidyverse(v.2.0.0), lubridate(v.1.9.2), DataExplorer(v.0.8.2), dplyr(v.1.1.2), viridis(v.0.6.4), viridisLite(v.0.4.1), DT(v.0.29), huxtable(v.5.5.2), gridExtra(v.2.3), plotly(v.4.10.2), ggmap(v.3.0.2), shiny(v.1.7.4), pander(v.0.6.5), kableExtra(v.1.3.4), knitr(v.1.42) and ggplot2(v.3.4.2)
loaded via a namespace (and not attached): backports(v.1.4.1), systemfonts(v.1.0.4), plyr(v.1.8.8), igraph(v.1.4.3), repr(v.1.1.6), lazyeval(v.0.2.2), splines(v.4.2.2), crosstalk(v.1.2.0), listenv(v.0.9.0), pryr(v.0.1.6), digest(v.0.6.31), htmltools(v.0.5.4), magick(v.2.7.4), fansi(v.1.0.4), magrittr(v.2.0.3), checkmate(v.2.1.0), tensor(v.1.5), tzdb(v.0.4.0), globals(v.0.16.2), matrixStats(v.0.63.0), svglite(v.2.1.1), timechange(v.0.2.0), spatstat.sparse(v.3.0-2), jpeg(v.0.1-10), colorspace(v.2.1-0), skimr(v.2.1.5), rvest(v.1.0.3), haven(v.2.5.2), xfun(v.0.37), tcltk(v.4.2.2), crayon(v.1.5.2), jsonlite(v.1.8.4), zoo(v.1.8-12), glue(v.1.6.2), polyclip(v.1.10-4), gtable(v.0.3.3), webshot(v.0.5.5), rapportools(v.1.1), abind(v.1.4-5), scales(v.1.2.1), DBI(v.1.1.3), Rcpp(v.1.0.10), xtable(v.1.8-4), units(v.0.8-3), proxy(v.0.4-27), htmlwidgets(v.1.6.1), httr(v.1.4.7), RColorBrewer(v.1.1-3), ellipsis(v.0.3.2), farver(v.2.1.1), pkgconfig(v.2.0.3), sass(v.0.4.5), deldir(v.1.0-9), utf8(v.1.2.3), labeling(v.0.4.2), reshape2(v.1.4.4), tidyselect(v.1.2.0), rlang(v.1.1.1), later(v.1.3.0), munsell(v.0.5.0), cellranger(v.1.1.0), tools(v.4.2.2), cachem(v.1.0.6), cli(v.3.6.0), generics(v.0.1.3), broom(v.1.0.5), evaluate(v.0.20), fastmap(v.1.1.0), yaml(v.2.3.7), goftest(v.1.2-3), RgoogleMaps(v.1.4.5.3), future(v.1.33.0), mime(v.0.12), xml2(v.1.3.3), brio(v.1.1.3), compiler(v.4.2.2), rstudioapi(v.0.14), curl(v.5.0.2), png(v.0.1-8), e1071(v.1.7-13), spatstat.utils(v.3.0-3), bslib(v.0.4.2), stringi(v.1.7.12), highr(v.0.10), desc(v.1.4.2), lattice(v.0.20-45), Matrix(v.1.6-1), commonmark(v.1.8.1), classInt(v.0.4-9), vctrs(v.0.6.2), pillar(v.1.9.0), lifecycle(v.1.0.3), networkD3(v.0.4), furrr(v.0.3.1), codebook(v.0.9.2), lmtest(v.0.9-40), jquerylib(v.0.1.4), data.table(v.1.14.8), bitops(v.1.0-7), httpuv(v.1.6.9), R6(v.2.5.1), promises(v.1.2.0.1), KernSmooth(v.2.23-20), parallelly(v.1.36.0), codetools(v.0.2-18), pkgload(v.1.3.2), MASS(v.7.3-58.2), assertthat(v.0.2.1), rprojroot(v.2.0.3), withr(v.2.5.0), mnormt(v.2.1.1), mgcv(v.1.8-41), parallel(v.4.2.2), hms(v.1.1.3), labelled(v.2.12.0), class(v.7.3-20), rmarkdown(v.2.20) and base64enc(v.0.1-3)
Email: andri.gerber@stud.hslu.ch. Department of Business, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland. HSLU. ORCiD ID.↩︎
Email: matthias.schmid@stud.hslu.ch. Department of Business, Lucerne University of Applied Sciences and Arts, Lucerne, Switzerland. HSLU.↩︎