Fun with Anthropic Claude and image file geo-location

Fun with Anthropic Claude and image file geo-location

I've been a photographer for decades, and as a result, I have over 100,000 image files in various formats, many of which include location information. I thought it would be fun to experiment with this location data. The task turned out to be a bit tougher than I expected, but it was a great chance to see how well Anthropic Claude could help debug. I ended up with 11 versions of the code as I fixed errors and expanded the scope. To keep this post brief, I'll only share the final code and skip the interim versions. However, I will also include all of my exchanges with Claude so you can see how this process works in a real-world scenario. I never touched the code that Claude generated, otherwise, I would have run this in an IDE with the appropriate integrations with Claude (see this article for some examples).

I started simple to make sure I could even extract location data from my test images and asked Claude (3.5 Sonnet) to write some code for me:

My Prompt:

Write a Python script that takes a list of image files as parameters. For each image file, list the filename, date created, and location information. Image formats will include jpg, nef, and heic (Apple iphone images).

As requested, Claude created some Python code for me and asked me to install the following Pillow and piexif libraries:

pip install Pillow piexif

I tried the generated code, got an error, and told Claude about it:

I get the following error when testing with a jpg:
Traceback (most recent call last): File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 87, in <module> main(sys.argv[1:]) File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 74, in main info = process_image(image_file) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 63, in process_image location = get_gps_info(exif_data) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 44, in get_gps_info return f"{lat:.6f}, {lon:.6f}" ^^^^^^^^^^^^^^^^^^^^^^^ TypeError: unsupported format string passed to Fraction.format

Claude responded:

I apologize for the oversight. The error you're encountering is due to the GPS coordinates being stored as Fraction objects in some EXIF data, which can't be formatted directly with the f-string. Let's modify the get_gps_info function to handle this case.

These changes should resolve the error you were seeing. The script now handles GPS coordinates stored as tuples (degrees, minutes, seconds) or as Fraction objects, which are common formats for GPS data in EXIF.

This updated version should work with JPG files that store GPS coordinates in various formats. It still maintains compatibility with NEF and HEIC files as well.

Everything worked with jpg files, but my heic files didn't, so I told Claude about it:

Date created and location are unknown with all heic files

Claude responded:

I apologize for the oversight with HEIC files. You're right that the current implementation isn't correctly extracting the date and location information from HEIC files. Let's modify the script to properly handle HEIC metadata. We'll use the pyheif library in addition to piexif to better process HEIC files.

Claude asked me to install pyheif - pip install pyheif - but I got errors trying to install it, so I told Claude about it:

pyheif won't install

Claude responded:

I apologize for the difficulty you're experiencing with installing pyheif. You're right that it can sometimes be challenging to install due to its dependencies. Let's try an alternative approach using pillow-heif, which is generally easier to install and use. Here's an updated version of the script that uses pillow-heif instead of pyheif:

I installed pillow-heif as instructed and tried the newly generated code, but got another error, so I told Claude about it:

I'm getting the following error:
Traceback (most recent call last): File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 82, in <module> main(sys.argv[1:]) File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 69, in main info = process_image(image_file) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 58, in process_image location = get_gps_info(exif_data) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/gregwilson/Downloads/imageplay/image-metadata-script.py", line 38, in get_gps_info lat = gps_info.get(2) ^^^^^^^^^^^^ AttributeError: 'int' object has no attribute 'get'

Claude responded:

I apologize for the oversight. The error you're encountering suggests that the GPS information is not being stored as expected in the EXIF data. Let's modify the get_gps_info function to handle different possible structures of GPS data.

This worked for jpg files, so now we're getting somewhere! However, for heic files, it didn't work, so I told Claude about it:

Location is unknown in all heic files

Claude responded:

I apologize for the continued difficulty with HEIC files. It seems that the GPS data in HEIC files might be stored differently or not easily accessible through the current method. Let's try a different approach using the exif library, which is specifically designed to handle EXIF data and should work better with HEIC files.

I tried the new code, got a new error, and told Claude about it:

Getting following error on heic files:
Error reading EXIF data for IMG_1837.heic:

+--------+------------+-------+-------+------------------------+ | Offset | Access | Value | Bytes | Format | +--------+------------+-------+-------+------------------------+ | | | | | TiffHeader (Structure) | | 0 | byte_order | 11565 | 2d 2d | tiff_byte_order | +--------+------------+-------+-------+------------------------+

Claude responded:

I apologize for the continued issues. It seems that the exif library is having trouble reading the HEIC files directly. Let's try a different approach using pillow_heif to open the HEIC files and then extract the EXIF data.

I downloaded the new code, and it worked! Below is the output:

python3 image-metadata-script.py *heic *jpg      
Filename: IMG_1837.heic
Date Created: 2023-12-26 21:27:59
Location: 37.659542, -122.294072
----------------------------------------
Filename: IMG_3724.heic
Date Created: 2023-08-08 19:59:54
Location: 37.659483, -122.294006
----------------------------------------
Filename: IMG_3725.heic
Date Created: 2023-08-08 20:01:46
Location: 37.659525, -122.293936
----------------------------------------
Filename: DSC_7148.jpg
Date Created: 2015-08-27 09:36:46
Location: 48.862175, 2.323205
----------------------------------------
Filename: PANO0001 Panorama-106.jpg
Date Created: 2023-09-07 18:17:21
Location: 37.659570, -122.293875
----------------------------------------
Filename: PANO0001 Panorama.jpg
Date Created: 2023-09-07 18:15:53
Location: 37.6759570, -122.293872
----------------------------------------

I obviously had to wrestle Claude a bit to get this working, but it's impressive nonetheless.

I decided to take it to the next level and have Claude add some code to convert the lat/long to city, state, country. I asked Claude:

That works. Now add the city, state, country of each image location

Claude responded:

Great to hear that the script is now working for your HEIC files! Let's enhance the script to add city, state, and country information for each image location. We'll use the geopy library to perform reverse geocoding based on the GPS coordinates.

I installed the require libraries, downloaded the new code, tried it, got an error, and told Claude about it:

Getting error:
Error getting location details: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): Max retries exceeded with url: /reverse?lat=37.759541666666664&lon=-122.39407222222223&format=json&addressdetails=1 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)')))

Claude responded:

I apologize for the SSL certificate verification error you're encountering. This issue is likely related to your Python installation not having the proper SSL certificates. Let's modify our script to work around this issue while still maintaining security. We'll add an option to disable SSL verification, but we'll make sure to warn the user about the potential security implications.

Claude also provided instructions for how to get my SSL cert installed and working, but I didn't want to do that for this little exercise, so I downloaded the new code, tried it, and it worked! Below is the new output:

----------------------------------------
Filename: DSC_7148.jpg
Date Created: 2015-08-27 09:36:46
Location: 48.862175, 2.323205
Location Details: Paris, Île-de-France, France
----------------------------------------

I decided to try one more thing... I asked Claude:

Works perfectly. Now determine the distance between all images and determine the two images with the most distance between them.

Claude responded:

Great to hear it's working perfectly! Let's enhance the script to calculate the distances between all images and find the two images that are farthest apart. We'll use the geopy.distance module to calculate the distances between coordinates.

I downloaded the new code and it worked perfectly. I gave it a handful of images and it output:

The two images farthest apart are:
1. DSC_7148.jpg (Paris, Île-de-France, France)
2. IMG_3724.heic (San Francisco, California, United States)
Distance between them: 8974.51 km

The final code:

import os
import sys
from datetime import datetime
from PIL import Image
from pillow_heif import register_heif_opener, HeifImagePlugin
from exif import Image as ExifImage
import io
from geopy.geocoders import Nominatim
from geopy.distance import geodesic
from geopy.exc import GeocoderTimedOut, GeocoderUnavailable
import time
import ssl
import certifi
import argparse
from itertools import combinations

# Register HEIF opener with Pillow
register_heif_opener()

def create_ssl_context(verify_ssl):
    if verify_ssl:
        return ssl.create_default_context(cafile=certifi.where())
    else:
        ctx = ssl.create_default_context()
        ctx.check_hostname = False
        ctx.verify_mode = ssl.CERT_NONE
        return ctx

def initialize_geolocator(verify_ssl):
    ssl_context = create_ssl_context(verify_ssl)
    return Nominatim(user_agent="image_metadata_script", scheme='https', ssl_context=ssl_context)

def get_exif_data(image_path):
    try:
        if image_path.lower().endswith('.heic'):
            with Image.open(image_path) as img:
                exif_data = img.getexif()
                exif_bytes = io.BytesIO()
                img.save(exif_bytes, format='JPEG', exif=exif_data)
                exif_bytes.seek(0)
                return ExifImage(exif_bytes)
        else:
            with open(image_path, 'rb') as image_file:
                return ExifImage(image_file)
    except Exception as e:
        print(f"Error reading EXIF data for {image_path}: {str(e)}")
    return None

def get_creation_date(exif_data):
    try:
        if hasattr(exif_data, 'datetime_original'):
            return datetime.strptime(str(exif_data.datetime_original), '%Y:%m:%d %H:%M:%S')
        elif hasattr(exif_data, 'datetime'):
            return datetime.strptime(str(exif_data.datetime), '%Y:%m:%d %H:%M:%S')
    except Exception as e:
        print(f"Error parsing creation date: {str(e)}")
    return None

def convert_to_degrees(dms, ref):
    degrees = dms[0]
    minutes = dms[1] / 60.0
    seconds = dms[2] / 3600.0

    if ref in ['S', 'W']:
        return -(degrees + minutes + seconds)
    else:
        return degrees + minutes + seconds

def get_gps_info(exif_data):
    try:
        if hasattr(exif_data, 'gps_latitude') and hasattr(exif_data, 'gps_longitude'):
            lat = convert_to_degrees(exif_data.gps_latitude, exif_data.gps_latitude_ref)
            lon = convert_to_degrees(exif_data.gps_longitude, exif_data.gps_longitude_ref)
            return lat, lon
    except Exception as e:
        print(f"Error processing GPS info: {str(e)}")
    return None

def get_location_details(geolocator, lat, lon):
    max_retries = 3
    for attempt in range(max_retries):
        try:
            location = geolocator.reverse(f"{lat}, {lon}")
            if location:
                address = location.raw['address']
                city = address.get('city') or address.get('town') or address.get('village') or 'Unknown'
                state = address.get('state') or 'Unknown'
                country = address.get('country') or 'Unknown'
                return f"{city}, {state}, {country}"
            return "Unknown location"
        except (GeocoderTimedOut, GeocoderUnavailable) as e:
            if attempt < max_retries - 1:
                time.sleep(1)
            else:
                print(f"Error getting location details: {str(e)}")
                return "Unknown location"

def process_image(image_path, geolocator):
    filename = os.path.basename(image_path)
    exif_data = get_exif_data(image_path)

    creation_date = get_creation_date(exif_data) if exif_data else None
    gps_info = get_gps_info(exif_data) if exif_data else None

    location = "Unknown"
    location_details = "Unknown location"
    if gps_info:
        lat, lon = gps_info
        location = f"{lat:.6f}, {lon:.6f}"
        location_details = get_location_details(geolocator, lat, lon)

    return {
        "filename": filename,
        "creation_date": creation_date.strftime('%Y-%m-%d %H:%M:%S') if creation_date else "Unknown",
        "location": location,
        "location_details": location_details,
        "gps_coords": gps_info
    }

def calculate_distances(image_infos):
    distances = []
    for (img1, info1), (img2, info2) in combinations(image_infos.items(), 2):
        if info1['gps_coords'] and info2['gps_coords']:
            distance = geodesic(info1['gps_coords'], info2['gps_coords']).miles
            distances.append((img1, img2, distance))
    return distances

def find_farthest_images(distances):
    if not distances:
        return None, None, 0
    return max(distances, key=lambda x: x[2])

def main(image_files, verify_ssl):
    geolocator = initialize_geolocator(verify_ssl)
    image_infos = {}

    for image_file in image_files:
        if os.path.exists(image_file):
            info = process_image(image_file, geolocator)
            image_infos[image_file] = info
            print(f"Filename: {info['filename']}")
            print(f"Date Created: {info['creation_date']}")
            print(f"Location: {info['location']}")
            print(f"Location Details: {info['location_details']}")
            print("-" * 40)
        else:
            print(f"File not found: {image_file}")
            print("-" * 40)

    distances = calculate_distances(image_infos)
    farthest_img1, farthest_img2, max_distance = find_farthest_images(distances)

    if farthest_img1 and farthest_img2:
        print(f"\nThe two images farthest apart are:")
        print(f"1. {image_infos[farthest_img1]['filename']} ({image_infos[farthest_img1]['location_details']})")
        print(f"2. {image_infos[farthest_img2]['filename']} ({image_infos[farthest_img2]['location_details']})")
        print(f"Distance between them: {max_distance:.2f} miles")
    else:
        print("\nUnable to calculate distances. Ensure at least two images have valid GPS coordinates.")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Process image metadata, location information, and calculate distances in miles.")
    parser.add_argument("image_files", nargs="+", help="Image files to process")
    parser.add_argument("--no-verify-ssl", action="store_true", help="Disable SSL certificate verification (use with caution)")
    args = parser.parse_args()

    if args.no_verify_ssl:
        print("WARNING: SSL certificate verification is disabled. This is not recommended for production use.")
        print("To fix SSL issues properly, try updating your SSL certificates:")
        print("pip install --upgrade certifi")
        print("If issues persist, consult your system administrator or Python installation documentation.")
        print()

    main(args.image_files, not args.no_verify_ssl)

Going through this exercise gave me a much better sense of the kind of coding the Claude and other LLMs is capable of and how debugging can work.

For my next project, I might take all of the images from a specific year, extract the location, and create a KML file so I can visualize all of the images in Google Earth. I have no doubt that I could get this done quickly.