AirBnB Preprocessing

Posted on December 19th, 2016

My approach to processing the information involved 3 steps.

Step 1: Parse every amenities string into a complete list of ameneties, delimiting each string by comas and adding any new amenities to a master list as the entries are searched.

Step 2: Used the complete list of amenities to help build a new table, with each one a tinyint of either 0 or 1. I then copied the relevant fields into the new table, searching each entry for amenities %like% the table columns. (Admittedly there was some overlap that I ignored. E.g. Internet and Wireless Internet. Internet’s internet guys. I can just ignore wireless of the end-user search form.) Each listing was also given a weight, which for the moment I set to be the listing’s price.

Step 3: UPDATE searchSpace SET weight = (weight*-1)+1000; Since this is a maximisation problem, I want the cheaper places to be more important than the pricier ones. Thus, this one SQL query inverts each weight and adds 1000 to it; where 1000 is the highest price in the list rounded UP to the nearest 10.