Kriss Jessop About Me Projects Personal Projects RPGs

AirBnB Preprocessing

Posted on December 19th, 2016

Before I was able to start on the application of Dispersive Flies Optimisation to the AirBnB data, it occured to me that while I could use the information as it was, things would be much simpler further down the line if I was to process the data I needed (namely the amenities field) into fields of their own with boolean 0,1 values, than it would be to parse the amenities field each time a fitness check was performed. It should probably be noted early that DFO is intended for search with high dimensionality, My search is likely only going to use a few dimensions and it’s entirely possible that a more conventional search algorithm would perform better.

My approach to processing the information involved 3 steps.

Step 1: Parse every amenities string into a complete list of ameneties, delimiting each string by comas and adding any new amenities to a master list as the entries are searched.

Step 2: Used the complete list of amenities to help build a new table, with each one a tinyint of either 0 or 1. I then copied the relevant fields into the new table, searching each entry for amenities %like% the table columns. (Admittedly there was some overlap that I ignored. E.g. Internet and Wireless Internet. Internet’s internet guys. I can just ignore wireless of the end-user search form.) Each listing was also given a weight, which for the moment I set to be the listing’s price.

Step 3: UPDATE searchSpace SET weight = (weight*-1)+1000; Since this is a maximisation problem, I want the cheaper places to be more important than the pricier ones. Thus, this one SQL query inverts each weight and adds 1000 to it; where 1000 is the highest price in the list rounded UP to the nearest 10.