A new paper about cities 🏙️, their structure 🧩 and satellites 🛰️!

"Decoding (urban) form and function using spatially explicit deep learning" by me and @darribas

https://doi.org/10.1016/j.compenvurbsys.2024.102147

We took spatial signatures and built a neural network to pick them up from Sentinel 2. And since it didn't show great results, we mixed in some geographic data science to learn if and how much it helps.

A quick 🧵 on work that took us 3 years to finish 🙃.

The inclusion of spatial modelling 'on top' of the predictions from the CNN proved to consistently improve the performance of the modelling pipeline, while some other techniques showed occasional success.

In the end, we managed to improve the accuracy of the prediction by almost 0.2 in some cases, bringing the quality of the tricky signature prediction up to the level of common land use or land cover prediction models.

Show thread

The link to the open access paper is in the first post. The code is available at https://urbangrammarai.xyz/signature_ai/ and the intermediate data at https://doi.org/10.6084/m9.figshare.26203367.

@martinfleis Awesome, this seems like a good thread to ask geo x ai questions 🤓

I always wondered why it's not more popular to take multiple scenes as inputs as in: video model or a transformer with positional embeddings encoding a sequence of scenes.

In addition I expected positional encodings of time and space so that the model can learn: this chip is from North America in December, so e.g. I can expect snow in the image.

Most earth observation models still seem to work on image data only?

@djh I think one barrier here is bootstrapping vit training, which as I understand it can be quite tricky at scale – they don’t have the same convexity advantages as CNNs. I’m basing this mostly on sciencedirect.com/science/arti (presented well in youtube.com/watch?v=vacNCxMA_G).

@djh (I am kind of mildly anti-vit, or anti vits being applied everywhere for no particular reason (see ConvNeXt, which I found pretty convincing), but I think the spatiotemporal stuff starts to get into the level of complexity where they might be justified.)

@vruba Wait it's not even about transformers.

It's just that I rarely see anyone in the geo space working on remote sensing machine learning projects treating it as a geo task and not simply pretending it's just images.

Examples, like I said: adding positional encoding of time and space:

- e.g. day in the year so we can pick up that snow in winter is reasonable

- e.g. coordinates or at least some sort of quadkey token so we can pick up that snow in winter in Berlin is reasonable

Follow

@djh For sure. I share this annoyance. Also:

– Not even trying to understand miltispectral bands.

– Sample-normalizing calibrated, linear data, thereby losing information.

– Augmenting in ways that might make sense for snapshots but do not reflect anything that you would actually see in geospatial data.

– Assuming there is little information in fine details because they’re used to Bayer demosaicked images, where that’s interpolated anyway.

Sign in to participate in the conversation
Horsin' Around

This is a hometown instance run by Sam and Ingrid, for some friends.