Estimating airB&B Prices

The challenge of getting results out of limited data

5 min readJan 7, 2021

As I approached the end of my schooling at Lambda, we kicked it off with one final project which they called Resfeber (RACE-fay-ber)- The restless race of the traveler’s heart before the journey begins. Resfeber we were told, was going to be a cross-discipline web app that would help families plan and prepare for a road trip and help them estimate what their trip would cost, including possible airB&B and gas prices. All in one simple to use application.

Our team consisted of 5 DS, 3 Web, 2 iOS, and 2 UX students and as Data Science students, one of my teammates and I took on the challenge of predicting the airB&B prices based on a location entered within the United States. Going into this project my first concern was figuring out if the data we used was going to have enough information in it in order to make the predictions.

Our first couple of weeks on the project were spent planning out what needed to be accomplished and how we were going to achieve those goals. In order to do that, we worked as a team to come up with possible user stories, statements that both explains who is requiring a feature and what they would like that feature to do. For example:

User Story: “As a user, I would like to be able to find the average price of Air B&Bs in the desired location.” — An example user story posted to Trello with its component tasks broken down as a To-Do list.

We then took that user story and as seen to the left, we separated that user story into individual tasks. That way we could keep track of what we needed to do so that we could have a better idea of how to manage our time more effectively.

Creating a Solution to the Problem

Once the planning had been accomplished for our project we got to work on our assigned portions. Anita and I had volunteered to work on the airB&B price predictions. This included:

First, we took the provided Kaggle dataset and cleaned it up so that it could be used, by eliminating the columns that did not work for the application.
Split the data into training and validation sets.
Built multiple models to make predictions with and tuned them to maximize the prediction accuracy measuring with the Mean Absolute Error (MAE).

As we were working on these the biggest problem we came across was that after doing a few models, ranging from Random Forest to a Nural Network, the best MAE we could come up with was about $141. When considering that the airB&B prices are often differing by only $10–50 sometimes then $141 is a high range of error.

At first, we thought that we would need to change the model type that we used, but after trying several we didn’t get much difference in the MAE. I then got to looking first at the features that we were using to predict the prices. The original data set had 16 different features per entry, but once trimmed down to what could reasonably be asked of the consumer using the app, we only had 3 different features we could use in the model. Latitude, Longitude, and Room Type.

This didn’t bode well for being able to increase our accuracy score with the data that we had.

Finally, I dug into the data itself and found that we hadn’t taken out any of the outliers that were so far off from the rest of the data that it was greatly throwing off the predictions. As you can see on the left, the maximum price within the data was a whopping $15,339.00! Something the average airB&B user will never even consider paying for their stay. I then began to trim out the airB&B listings that were above $500 as that seemed like the high-end price that most people would be willing to pay.

The price distribution of the listings that I kept in the dataset.

After that was cleaned up, the prediction accuracy increased by leaps and bounds. We were able to achieve an MAE of $59.17, while not as good one would hope it would be, it was by far better than what we were getting before.

The product’s outcome and beyond

Once that was figured out, the DS team was able to deploy the model on AWS as an API. This allows both the iOS and Web team members to pass in the desired parameters and receive a prediction back. As you can see above, a .json with the correct data can be passed into the API and the prediction will be returned. The other feature of the API is that gas prices can be predicted based on the date entered.

In the future of this product, were we to continue to work on it, I could see us increasing the accuracy of the predictions by using fuller more predictive data that covered more than just a few random cities in the US. We would also work on adding more features like being able to find the nearest Tesla charging station for those users with electric cars. Or being able to find the average attraction prices of things to do within the desired area. The biggest challenge I could see with all these things would be to find or collect the correct data that would be needed in making these predictions.

Overall I learned a lot from this project. Things that range from being able to coordinate with a team on a large scale project, to figure out how best to update the Github without overwriting someone else’s work. My team was a great help in this entire process and I even received some feedback on how to make the predictions interact well with the front end’s part of the project. Without that, we would have been trying to predict with a model that had too much noise.

All this really helped me to grow professionally. I was able to see what did and didn’t work within a team environment. I was able to practice both cleaning and analyzing data so that it could be used to make predictions. And most importantly, I learned and continue to learn how to utilize the team’s skills and experience for the good of the whole team. By taking advantage of what people are good at and not having to have everyone be an expert in everything.

For those who would like to view the GitHub of the DS Section of the project.

Estimating airB&B Prices

The challenge of getting results out of limited data

Creating a Solution to the Problem

The product’s outcome and beyond

Written by Ethan Holden