Meine Buchveröffentlichung: Räumliche Analyse und Visualisierung von Mietpreisdaten, Geoinformatische Studie zur räumlichen Optimierung von Immobilienportalen

Meine Dissertation wurde von Springer Spektrum als Buch veröffentlicht. Unter dem folgenden Link findet man die Veröffentlichung: http://www.springer.com/de/book/9783658177737#springer

Das Inhaltsverzeichnis kann man hier herunterladen: http://bit.ly/2p0Aj6F

Einen Produktflyer findet man hier: http://bit.ly/2ovHeAM

 

Über das Buch

Harald Schernthanner verfolgt das Ziel, aus geoinformatischer Sicht eine konzeptionelle Grundlage zur räumlichen Optimierung von Immobilienportalen zu schaffen. Dabei geht der Autor davon aus, dass Verfahren der räumlichen Statistik und des maschinellen Lernens zur Mietpreisschätzung sich besser als die bisher eingesetzten Verfahren der hedonischen Regression zur räumlichen Optimierung von Immobilienportalen eignen. Er zeigt, dass die von Immobilienportalen publizierten webbasierten Mietpreiskarten nicht die tatsächlichen räumlichen Verhältnisse auf Immobilienmärkten wiedergeben. Alternative webbasierte Darstellungsformen, wie beispielsweise „Gridmaps“, sind dem Status quo der Immobilienpreiskarten von Immobilienportalen überlegen und visualisieren die tatsächlichen räumlichen Verhältnisse von Immobilienpreisen zweckmäßiger.

Web Feature Service: Rental offers from Berlin and it’s surroundings

Since September 2017 I fetch rental data from Berlin and it’s surroundings (e.g. Potsdam, Ludwigsfelde, Oranienburg). I hook myself to the Nestoria portal and parse their data to the shapefile format ; shapefile is used for the convenience of several R-libraries I use.

So I thought, why not to publish some of the data via my Geoserver instance as WFS. I chose the following daily snapshots: 7.11.17, 22.12.17, 23.12.17, 19.1.18 From time to time I´ll add more layers. Using this url you can add the datasets as WFS layer to your favorite GIS: http://31.172.93.180:8080/geoserver/Rental_data/wfs?version=1.1.0&layers=Rental_data

Have fun with the data :-). If you need more or all data I collected till September, just contact me.

Each data set covers around 7000 geocoded rental offers, coming from different providers. I’ll go on collecting, as I want to observe quarterly changes, the standard interval in rental market observation; let´s see maybe I can discover some interesting spatial patterns after longer observation period. Maybe someone of you has ideas, what else can be done with the data. Just as eye catcher I did the heat map below, unfortunately I can’t figure out how to create a legend of the heat map in QGIS (2.18) – so basically red means a high base rent per square metre. It seems that heat maps in QGIS don´t have an option to add a legend. The grey points represent the rental offers locations.

Refer to this older blog post, if you are interested in the data-collection method script: https://schernthanner.de/shell-script-to-automate-downloading-of-rental-data-using-wget-jq-a-cronjob

Shell script to automate the download of rental data using WGET, JQ & Cron

Until recently I fetched rental data via a script based on R´s Jsonlite library and by accessing Nestoria´s API  (a vertical search engine, that bundles the data of several real estate portals). My first script had to be executed manually; in the next attempt, I started to automate the data downloading, unlikely Nestoria blocked my IP.  I admit, I excessively downloaded data and did the download using a static IP and a  cronjob that has been executed always on a daily basis, on the same daytime. This resulted in a 403 error (IP forbidden, I used a static IP). So together with Nico (@bellackn) an alternative was figured out. Instead of Jsonlite, our shell script uses WGET and makes use of the great JQ tool (CSV to JSON Parser). Thank´s Nico, for the input and ideas.

Next a few of the most relevant lines of code are explained. The entire code can be seen and downloaded from Github: https://github.com/hatschito/Rental_data_download_shell_script_WGET_JQ

We use the  w 60 and –random-wait flag, this tells WGET to either wait 0, 60 or 120 secs to download. This behavior tricks the server. Within WGET also the area of interest is defined. The API allows LAT/LONG in decimal degrees or place names.

wget -w 60 --random-wait -qO- "http://api.nestoria.de/api?country=de&pretty=1&encoding=json&action=search_listings&
place_name=$place&listing_type=rent&page=1";

After that, the first page is downloaded. The first page has to be altered with the sed command (UNIX command to parse text). A while loop does the downloading of the remaining pages, the page number to be downloaded can be modified. We receive JSON files, that have to be parsed to a geodata format.

While Loop:

echo -e "\nOrt: $place\nSeite 1 wird heruntergeladen."
sed '/application/ d' ./rentalprice-$datum.json > clr-rentalprice-$datum.json
 
i=1
while [ $i -le 25 ]
#insert the number of pages you want to download, here: 2 to 28
#(find out how much pages you need/Nestoria offers - the json with "locs" in the file name should have just one comma at the end
#of the file - lower the number according to the odd commas - e.g. for Potsdam, it's 28)
# -----> (you'll also have to comment the deletion of the locs-file way down the script in order to do so...) > ./rentalprice-locs-$datum.json
  printf "," >> ./rentalprice-locs-$datum.json
  i=$[$i+1]
done

Parse JQ to CSV:

JQ, a command line JSON interpreter, parses the JSON to CSV

jq .response.listings < rental_prices-$place-$datum.json | in2csv -f json > CSV-rental_prices.csv

In the following step the data is loaded to a short R script. R´s SP library converts the CSV to a shapefile (actually we will skip this part and in the next version – GDAL should manage the file conversion).
Back in the shell script a timestamp with the current date and time is appended to the shapefile. After some cleaning at the end of the Shell script, finally a Cronjob is created in order to schedule the daily data download. The Cronjob also can be done via a GUI: https://wiki.ubuntuusers.de/GNOME_Schedule/

Still the resulting shapefiles are stored file based, but I plan to hook the script to a PostgreSQL database, that was already installed on my Linux VServer.

Feel free to use our script, if you are interested in downloading geocoded rental data for your analysis and your area of interest. Any feedback or comment is appreciated.

To you need geocoded rental data for German cities? This small R script helps

The following short R script uses the Json parser Jsonlite and fetches gecoded real estate offers via the Nestoria REST API. The Nestoria API does not require any kind of authentification (as OAuth or an API key) and delivers current real estate date from all big and also smaller German real estate portals. Downloads are limited to 20 or 50 data points per page and to 1000 offers per request. So if you need a larger area, just adapt the bounding box or by filtering using the request paramaters documented on the Nestoria developers website: http://www.nestoria.co.uk/help/api-search-listings
The script loops through the pages provided by the API and puts the result into a R dataframe. The R dataframe easily can be exported as CSV for further use in QGIS or any other GIS software (e.g. Arc GIS).
If you have any questions, feel free to ask. The comments in the script are partly German and partly English, sorry for that mess :-). I tested the script fetching Berlin and Potsdam real estate data.

 


#Author Harald Schernthanner, basierend auf: JSOn Data über API mit meheren Pages: https://cran.r-project.org/web/packages/jsonlite/vignettes/json-paging.html
#Schleife zum fetchen der Daten

install.packages("jsonlite")
install.packages('curl')
library(jsonlite)
  
baseurl <- "http://api.nestoria.de/api?country=de&pretty=1&encoding=json&action=search_listings&place_name=Potsdam&listing_type=rent"
#Anzahl der "Seiten entscheidet wie oft durch den Datensatz geloopt werden soll
#eine for Schleife loopt durch die Seiten und schreibt die Ergebnisse in den Dataframe "pages"

pages <- list()
for(i in 0:24){
  mydata <- fromJSON(paste0(baseurl, "&page=", i))
  message("Retrieving page ", i)
  pages[[i+1]] <- mydata$response$listings
}

#Kombiniert die einzelnen Abfragen
angebote_potsdam <- rbind.pages(pages)

#check output
nrow(angebote_potsdam)


#Exportieren der Daten

write.csv2(angebote_potsdam, "potsdam.csv")



Housing prices with service costs in Potsdam inner city

Housing prices with service costs in Potsdam inner city