Can an algorithm predict which businesses will close?

A closed store in New York City. Image: Getty.

Over the past decade, changes in the way people shop have led more and more businesses to close their doors, from small music venues to book shops and even major department stores. This trend has been attributed to several factors, including a shift towards online shopping and changing spending preferences. But business closures are complex, and often due to many intertwined factors.

To better understand and account for some of these factors, my colleagues at the University of Cambridge and Singapore Management University and I built a machine learning model, which predicted shop closures in ten cities around the world with 80 per cent accuracy.

Our research modelled how people move through urban areas, to predict whether a given business will close down. This research could help city authorities and business owners to make better decisions, for example about licensing agreements and opening hours.

Pattern spotting

Machine learning is a powerful tool which can automatically identify patterns in data. A machine learning model uses those patterns to tests hypotheses and make predictions. Social media provides a rich source of data to examine the patterns of its users through their posts, interactions and movements. The detail in these datasets can help researchers to build robust models, with a complex understanding of user trends.

Using data about consumer demand and transport, along with ground-truth data on whether businesses actually closed, we devised metrics which our machine learning model used to identify patterns. We then analysed how well this model predicted whether a business would close, given only metrics about that business and the area it was in.

Our first dataset was from Foursquare, a location recommendation platform, which included check-in details of anonymous users and represented the demand for businesses over time. We also used data from taxis trajectories, which gave us the pickup and drop-off points of thousands of anonymous users; these represented dynamics of how people move between different areas of a city. We used historic data from 2011 to 2013.

Taxiiii! Image: Sunset Noir/Flickr/creative commons.

We looked at a few different metrics. The neighbourhood profile took into account the area surrounding a business, such as the different kinds of businesses also operating, as well as competition. Customer visit patterns represented how popular a business was at any given time of day, compared with its local competitors. And business attributes defined basic properties such as the price bracket and type of business.

These three metrics enabled us to model how closure predictions differ between new and established venues, how the predictions varied across cities and which metrics were the most significant predictors of closure. We were able to predict the closure of established businesses more accurately, which suggested that new businesses can face closure from a bigger variety of causes.


Making predictions

We found that different metrics were useful for predicting closures in different cities. But across the ten cities in our experiment – Chicago, London, New York, Singapore, Helsinki, Jakarta, Los Angeles, Paris, San Fransciso and Tokyo – we saw that three factors were almost always significant predictors of a business’s closure.

The first important factor was the range of time during which a business was popular. We found that businesses which cater to only specific customer segments – for example, a café popular with office workers at lunchtime – are more likely to close. It also mattered when a business was popular, compared with its competitors in the neighbourhood. Businesses that were popular outside of the typical hours of other businesses in the area tended to survive longer.

We also found that when the diversity of businesses declined, the likelihood of closure increased. So businesses located in neighbourhoods with a more diverse mix of businesses tended to survive longer.

Of course, like any dataset, the information we used from Foursquare and taxis is biased in some ways, as the users may be skewed towards certain demographics or check in to some types of businesses more than others. But by using two datasets which target different kinds of users, we hoped to mitigate those biases. And the consistency of our analysis across multiple cities gave us confidence in our results.

We hope that this novel approach to predicting business closures with highly detailed datasets will help reveal new insights about how consumers move around cities, and inform the decisions of business owners, local authorities and urban planners right around the world.

The Conversation

Krittika D'Silva, PhD Candidate, University of Cambridge.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

 
 
 
 

London’s rail and tube map is out of control

Aaaaaargh. Image: Getty.

The geographical limits of London’s official rail maps have always been slightly arbitrary. Far-flung commuter towns like Amersham, Chesham and Epping are all on there, because they have tube stations. Meanwhile, places like Esher or Walton-on-Thames – much closer to the city proper, inside the M25, and a contiguous part of the built up area – aren’t, because they fall outside the Greater London and aren’t served by Transport for London (TfL) services. This is pretty aggravating, but we are where we are.

But then a few years ago, TfL decided to show more non-London services on its combined Tube & Rail Map. It started with a few stations slightly outside the city limits, but where you could you use your Oyster card. Then said card started being accepted at Gatwick Airport station – and so, since how to get to a major airport is a fairly useful piece of information to impart to passengers, TfL’s cartographers added that line too, even though it meant including stations bloody miles away.

And now the latest version seems to have cast all logic to the wind. Look at this:

Oh, no. Click to expand. Image: TfL.

The logic for including the line to Reading is that it’s now served by TfL Rail, a route which will be part of the Elizabeth Line/Crossrail, when they eventually, finally happen. But you can tell something’s gone wrong here from the fact that showing the route, to a town which is well known for being directly west of London, requires an awkward right-angle which makes it look like the line turns north, presumably because otherwise there’d be no way of showing it on the map.

What’s more, this means that a station 36 miles from central London gets to be on the map, while Esher – barely a third of that distance out – doesn’t. Nor does Windsor & Eton Central, because it’s served by a branchline from Slough rather than TfL Rail trains, even though as a fairly major tourist destination it’d probably be the sort of place that at least some users of this map might want to know how to get to.

There’s more. Luton Airport Parkway is now on the map, presumably on the basis that Gatwick is. But that station doesn’t accept Oyster cards yet, so you get this:

Gah. Click to expand. Image: TfL.

There’s a line, incidentally, between Watford Junction and St Albans Abbey, which is just down the road from St Albans City. Is that line shown on the map? No it is not.

Also not shown on the map: either Luton itself, just one stop up the line from Luton Airport Parkway, or Stansted Airport, even though it’s an airport and not much further out than places which are on the map. Somewhere that is, however, is Welwyn Garden City, which doesn’t accept Oyster, isn’t served by TfL trains and also – this feels important – isn’t an airport.

And meanwhile a large chunk of Surrey suburbia inside the M25 isn’t shown, even though it must have a greater claim to be a part of London’s rail network than bloody Reading.

The result of all these decisions is that the map covers an entirely baffling area whose shape makes no sense whatsoever. Here’s an extremely rough map:

Just, what? Image: Google Maps/CityMetric.

I mean that’s just ridiculous isn’t it.

While we’re at it: the latest version shows the piers from which you can get boats on the Thames. Except for when it doesn’t because they’re not near a station – for example, Greenland Pier, just across the Thames to the west of the Isle of Dogs, shown here with CityMetric’s usual artistic flair.

Spot the missing pier. You can’t, because it’s missing. Image: TfL/CityMetric.

I’m sure there must be a logic to all of this. It’s just that I fear the logic is “what makes life easier for the TfL cartography team” rather than “what is actually valuable information for London’s rail passengers”.

And don’t even get me started on this monstrosity.

Jonn Elledge is the editor of CityMetric. He is on Twitter as @jonnelledge and on Facebook as JonnElledgeWrites.