Here’s a tricky task. Pick a photograph from the Web at random. Now try to work out where it was taken using only the image itself. If the image shows a famous building or landmark, such as the Eiffel Tower or Niagara Falls, the task is straightforward. But the job becomes significantly harder when the image lacks specific location cues or is taken indoors or shows a pet or food or some other detail.
Nevertheless, humans are surprisingly good at this task. To help, they bring to bear all kinds of knowledge about the world such as the type and language of signs on display, the types of vegetation, architectural styles, the direction of traffic, and so on. Humans spend a lifetime picking up these kinds of geolocation cues.
So it’s easy to think that machines would struggle with this task. And indeed, they have.