Recently, I learned that Mac users work with Numbers when creating spreadsheets, and the application generates .numbers
files by default. These files are supposed to be compatible with Libre Office, but version 7.3.7.2
was not able to process them in my case.
There are several online tools that claim they can convert the file to other spreadsheet formats. Some of them work, but their functionality is limited as they aim to persuade you to buy a premium subscription, which some offer only for a year.
I know pandas can manage several spreadsheet formats, but .numbers
is not one of them. Fortunately, there is a Python module called numbers-parser that can handle such files, although with certain limitations.
In this article, I'll demonstrate how to load .numbers
files in Python using pandas and numbers-parser
in a containerized environment, and how they can be loaded into data frames for further analysis.
Implementation
To simplify reproduction, I'll utilize a containerized environment with Jupyter. This method enables the execution of code on any operating system that supports Docker. If you prefer running this without Docker, you can directly install the required packages on your system. Ensure to follow the steps outlined in the Dockerfile.
First, clone the repository:
git clone https://github.com/nezhar/jupyter-docker-compose
Next, navigate to the jupyter-docker-compose
directory and modify the Dockerfile
located in docker/jupyter
. This is necessary because numbers-parser
requires libsnappy-dev
for installation, as well as gcc
and g++
:
FROM jupyter/minimal-notebook:lab-4.0.2
USER root
RUN apt update && apt install -y libsnappy-dev gcc g++
USER $NB_UID
COPY ./requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt
Then, extend the requirements.txt
with the following lines:
numpy==1.26.4
pandas==2.2.1
matplotlib==3.8.3
numbers-parser==4.10.4
Finally, build the image:
docker compose build
Usage
To start a container with the newly created image, run:
docker-compose up
Open the Jupyter IDE in your browser by navigating to http://localhost:8888.
Create a directory in your workspace and upload your .numbers
files.
For demonstration, I will use an example file found on GitHub in the code below.
import pandas as pd
from numbers_parser import Document
doc = Document("files/Photo Worksheet.numbers")
dfs = []
for sheet in doc.sheets:
for table in sheet.tables:
data = table.rows(values_only=True)
dfs.append(pd.DataFrame(data[1:], columns=data[0]))
dfs
Running this code will populate the dfs
list with two data frames.
Conclusion
With the setup complete and your .numbers
files successfully loaded into data frames, you're now ready to explore and analyze your data within the Jupyter environment. This approach not only simplifies the process of working with .numbers
files across different platforms but also leverages the powerful features of pandas for data manipulation and analysis.