The potential of the urban data revolution for research on cities

The potential of the urban data revolution for research on cities

Details

Written by:

Daniel Arribas-Bel

First Published:

21 Aug 2018, 4:42 am

The potential of the urban data revolution for research on cities

Abstracthttp://journals.sagepub.com/doi/full/10.1177/0042098018779554#abstract

In the last few years, an explosion of data about cities and urban life has taken place. Some authors are starting to talk about a revolution that will fundamentally change how we study and understand cities. Such new forms of data range from credit card transactions, to phone calls made from our smartphone, to the myriad of “digital breadcrumbs” we leave as we broadcast our lives through different social networks. A particular family of this latter group includes location-based services (LBSs), which are online platforms that allow their users to share their location. Some LBSs are embedded as part of larger services, such as the possibility of tagging a tweet with a pair of coordinates, or the option of including a specific place in a Facebook post; other LBSs however take the “L” more seriously and build an entire product centered around location.

In a paper I recently published with Jessie Bakens, we use an example for the Netherlands and explore the potential for LBSs to inform urban research. Using restaurant data from Foursquare, the leading LBS company, we assess the validity of the data in two ways. First, we explore the extent to which Foursquare data is biased; second, we explore correlations that are often the subject in research on the attractiveness of cities. We think our results are relevant on both fronts.

As far as the first question is concerned, probably one of the main hurdles separating new forms of data from wider adoption in urban research is the concern that, unlike official surveys and data products like Censuses, these datasets do not represent accurately an entire population and hence can lead to inappropriate conclusions. Such worries are justified. To characterise this potential distortion, we compare our Foursquare dataset with two official sources of information. The easiest way to display our results is captured in the figure below:

 

Figure

The first and third maps display the distribution of restaurants as measured by our Foursquare dataset and the official source, respectively. More importantly, the central map shows the difference between the two, effectively pinpointing areas where more restaurants are likely to appear in the Foursquare data than in the official source (blue) or vice versa (red). What becomes clear when one studies the figure is that the distribution of this mismatch is far from random, with Foursquare capturing many more restaurants in areas like Amsterdam, The Hague, or Rotterdam, and a few less in the more rural northern part of the country. This skew towards richer, better educated, and more urban areas limits the Foursquare data but does not render them useless. In the paper, we argue this may be an important feature to take into account when drawing conclusions and provide some scenarios where, even representing only a particular fraction of the population, this type of data can be useful.

This more optimistic view is further supported by our regression analysis in the second part of the paper. We augment the information on each restaurant provided by Foursquare with characteristics of the hyper-local environment (within 500m.) extracted also from Foursquare as well as official sources such as Statistics Netherlands. If the data were not picking up any significant variation across different types of restaurants and locations, we should not find any meaningful association between the popularity of a restaurant (as measured by check-ins) and its characteristics. By contrast, our analysis provides evidence of several significant and sensible correlations. For example, we find that controlling for individual features, restaurants in areas with more shops are more popular; or that Turkish restaurants experience a popularity premium if they are located in predominantly Turkish neighborhoods. In the paper, we present other interesting findings along these lines and suggest interpretations for our results based on the relevant urban economics literature. If you would like to know more about our findings, please head directly to the published paper.

 

Read the paper on Urban Studies – Online First here