Sicily-AirBnb homes analysis
Intro
In this post we’ll explore data from the AirBnb homes in Sicily, as part of a project from Udacity’s Data Scientist Nanodegree.
Who’s Sicily
According to Wikipedia :
Sicily (Italian: Sicilia [siˈtʃiːlja]; Sicilian: Sicilia [sɪˈʃiːlja]) is the largest island in the Mediterranean Sea and one of the 20 regions of Italy. The region has 5 million inhabitants. Its capital city is Palermo.
Sicily is in the central Mediterranean Sea, south of the Italian Peninsula, from which it is separated by the narrow Strait of Messina. Its most prominent landmark is Mount Etna, the tallest active volcano in Europe,[5] and one of the most active in the world, currently 3,329 m (10,922 ft) high. The island has a typical Mediterranean climate.
The earliest archaeological evidence of human activity on the island dates from as early as 12,000 BC.[6][7] By around 750 BC, Sicily had three Phoenician and a dozen Greek colonies and it was later the site of the Sicilian Wars and the Punic Wars. After the fall of the Roman Empire in the 5th century AD, Sicily was ruled during the Early Middle Ages by the Vandals, the Ostrogoths, the Byzantine Empire, and the Emirate of Sicily. The Norman conquest of southern Italy led to the creation of the County of Sicily in 1071, that was succeeded by Kingdom of Sicily, a state that existed from 1130 until 1816.[8][9] Later, it was unified under the House of Bourbon with the Kingdom of Naples as the Kingdom of the Two Sicilies. The island became part of Italy in 1860 following the Expedition of the Thousand, a revolt led by Giuseppe Garibaldi during the Italian unification, and a plebiscite. Sicily was given special status as an autonomous region on 15 May 1946, 18 days before the Italian institutional referendum of 1946.
Sicily has a rich and unique culture, especially with regard to the arts, music, literature, cuisine, and architecture. It is also home to important archaeological and ancient sites, such as the Necropolis of Pantalica, the Valley of the Temples, Erice and Selinunte. Byzantine, Arab, Roman and Norman rule over Sicily has led to a blend of cultural influences.
From my point of view, Sicily is a wonderful island, which I explored a bit on bicycle and where I would return at any time. Here are some pictures from my trip.
Where to stay in Sicily
The loveliest place I’ve stayed in Sicily was this mountain hut at 1820m altitude, called Rifugio Timparossa, but I definitely have a special taste regarding comfort 🙊 So, better trust the data, not myself! 🤫
What to expect from this post
Udacity’s team suggested the folowing directions:
- understand how much AirBNB homes are earning in certain time frames and areas
- compare rates between some cities
- try to understand if there is anything about the properties that helps predict the price
- find negative and positive reviews based on text
What I’ve managed to do so far:
- where to find an AirBNB home in Sicily
- how are the prices distributed
- which are the cheapest/most expensive neigbourhoods
- which room types are available
- who and where are the hosts with most reviews
- where are the verified hosts or the super-hosts
- how are the features of a home correlated
- how much a feature impacts the price of a property
- are the prices higher during the summer
- are the properties available in the future
- a basic exploration of what people say about the places they’ve been to
Let’s explore
Where are the Airbnb homes
We’ll use the map coordinates, latitude and longitude, to plot a heatmap of the AirBnb homes available.
The properties are all over the island, but also in the little islands near by, like Isola di Pantelleria or Lampedusa.
Most of them are near the coast, in the big cities like Palermo, the capital city, but also in Catania, Siracusa, Ragusa, Agrigento and Trapani.😎
Where are the super hosts
If you want to be really picky, try only the super hosts, although the options are not that many and I don’t actually know what a super host is 😀
What about the verified hosts
You might also want to go to verified hosts only, but it’s a matter of confidence and time of the year, when you’ll have to choose what’s available.
Which are the most crowded neighbourhoods
Palermo, the capital city, Siracusa and Catania have the largest number of AirBnb homes. The other cities from this top 10 are smaller as area, but still offer a large palette of homes.
Which room types are available
Seems like you get mostly the entire home or apartment, which is great!
Cheapest neighbourhoods
The mean price for our Airbnb homes is 95 dollars, the median is 60 -> the data is highly skewed, but 75% of prices are below 93$
Here are the cheapest neighbourhoods by median price, where the neighbourhood has at least 100 AirBnb homes available. So, you can visit Catania, Trapani, Messina, Palermo and other cities with less then 50$ per night.
Most expensive neighbourhoods
Taormina, Trecastagni and Noto are the most expensive places, their median price by night is around 85$, but I can assure you that they worth it!
Prices
Prices by category values
We’ll check the distribution of prices with respect to the values of qualitative features like : neighbourhood, room/property type, host response time and try to see if for some categories the prices go up.
There are no significant differences for features like:
- the host identity is verified or not
- the host has a profile picture
- the property is instantly bookable
There are some differences for :
- the availability of the home — the median price is smaller and the range of prices is bigger for homes that are available
- the host type : the price range for normal hosts (about 85% of total hosts) is larger and the median price is a bit higher than for the super-hosts
There are clear differences in price when:
- the hosts responds whithin a day, median and price range are higher
- the number of bathrooms available : 3/4 baths means definitely an increase in price (I’ve plotted only 10 categories, but the types of baths in ‘Other’ varies a lot, from more than 5 baths and some shared baths)
- an entire villa is also more expensive (not a surprise either)
Categorical variable’s impact on price
Here’s another way to estimate how price is influenced by the categorical variables. For each variable, prices are partitioned to distinct sets based on category values. Then we check with ANOVA test if these sets have similar means. If a variable has a minor impact, set means should be equal.
From the ANOVA tets, it seems like the property type and the neighbourhood are the most important categorical features, in terms of impact in price, which definitely not a big surprise.
Numerical variable’s impact on price
We’ve encoded categorical variables by the mean price of category values (called target encoding). Then we’ve calculated Spearman correlation coefficients, which picks up relationships between variables even when they are nonlinear. Here are the obtained scores:
From the Spearman’s scores, the main criterions in establishing Airbnb home prices are:
- accomodates
- bathrooms (target encoded)
- property type (encoded)
- neighbourhood (encoded)
- number of reviews
Another way to determine feature’s impact on price
We’ve used the features available to make a model that predicts the price.
Then plot the feature importances of the model(catBoost), to see which features affects most the price.
The catBoost model gives a new perspective about the features and their importance in predicting the price. There are many transformations one can make to both the model and the features, which will probably lead to other ranks in feature’s importance, but this was not the scope of this post.
The future
Listings available by day
How many listings(or AirBnb homes) are available in the near future:
There are plenty of homes available until 2022! We must only pray for covid to go away! 🙏
Average price by day
The property types are quite diverse, so we’ll look only at those for two people.
Prices do go up a bit during summer, but I wouldn’t pay more to be more uncomfortable during the hot summer days of Sicily 🌞
What people say about Airbnb homes in Sicily
Hosts with most reviews
Take me to the clouds
There are reviews in many languages, but most of them are in english, italian or french. Here are the most common words people used in their reviews.
Positive/negative reviews
The analysis was quite basic and incomplete: I’ve used the SentimentIntensityAnalyzer from python to get some scores of positivity /negativity / neutrality/ compound in the reviews.
The scores are not that relevant:
- the score of neutrality is mostly 1
- the scores for positivity/negativity are mostly 0
Take aways
Some conclusions
- we did not discover much with this analysis, but I enjoyed remembering some places I’ve been to and playing with the graphics and images above
- Sicily has a lot of Airbnb properties all over the island, at all preferred prices
- you’ll pay more for more baths and bedrooms (not a surprise)
- the most expensive neighbourhoods are the ones loved by tourists (like Taormina)
- the cheapest are Gela and Catania(big city with lots of homes available)
- reviews are an important factor in price
- prices go up a bit during summer
- most properties are available until 2022, but this is probably due to covid restrictions :(
Some TODOs
- analyse more the reviews and make other models to predict the price of a property (note to self😉)
- go to Sicily and eat a “cannolo siciliano” 🤤! (note to the reader)