Load Apple Numbers files in Python using Pandas using Containers

Posted by Harald Nezbeda on Wed 27 March 2024

Recently, I learned that Mac users work with Numbers when creating spreadsheets, and the application generates .numbers files by default. These files are supposed to be compatible with Libre Office, but version 7.3.7.2 was not able to process them in my case.

There are several online tools that claim they can convert the file to other spreadsheet formats. Some of them work, but their functionality is limited as they aim to persuade you to buy a premium subscription, which some offer only for a year.

I know pandas can manage several spreadsheet formats, but .numbers is not one of them. Fortunately, there is a Python module called numbers-parser that can handle such files, although with certain limitations.

In this article, I'll demonstrate how to load .numbers files in Python using pandas and numbers-parser in a containerized environment, and how they can be loaded into data frames for further analysis.

Implementation

To simplify reproduction, I'll utilize a containerized environment with Jupyter. This method enables the execution of code on any operating system that supports Docker. If you prefer running this without Docker, you can directly install the required packages on your system. Ensure to follow the steps outlined in the Dockerfile.

First, clone the repository:

git clone https://github.com/nezhar/jupyter-docker-compose

Next, navigate to the jupyter-docker-compose directory and modify the Dockerfile located in docker/jupyter. This is necessary because numbers-parser requires libsnappy-dev for installation, as well as gcc and g++:

FROM jupyter/minimal-notebook:lab-4.0.2

USER root
RUN apt update && apt install -y libsnappy-dev gcc g++

USER $NB_UID
COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt

Then, extend the requirements.txt with the following lines:

numpy==1.26.4
pandas==2.2.1
matplotlib==3.8.3

numbers-parser==4.10.4

Finally, build the image:

docker compose build

Usage

To start a container with the newly created image, run:

docker-compose up

Open the Jupyter IDE in your browser by navigating to http://localhost:8888.

Create a directory in your workspace and upload your .numbers files. For demonstration, I will use an example file found on GitHub in the code below.

import pandas as pd
from numbers_parser import Document

doc = Document("files/Photo Worksheet.numbers")

dfs = []

for sheet in doc.sheets:
    for table in sheet.tables:
        data = table.rows(values_only=True)
        dfs.append(pd.DataFrame(data[1:], columns=data[0]))

dfs

Running this code will populate the dfs list with two data frames.

Conclusion

With the setup complete and your .numbers files successfully loaded into data frames, you're now ready to explore and analyze your data within the Jupyter environment. This approach not only simplifies the process of working with .numbers files across different platforms but also leverages the powerful features of pandas for data manipulation and analysis.