Year: 2025

Machine Learning, Programming

VectorVFS: your filesystem as a vector database

PS: thanks for all the interest, here you are some discussions about VectorVFS as well:
Hacker News: discussion thread
Reddit: discussion thread

When I released EuclidesDB in 2018, which was the first modern vector database before milvus, pinecone, etc, I ended up still missing a piece of simple software for retrieval that can be local, easy to use and without requiring a daemon or any other sort of server and external indexes. After quite some time trying to figure it out what would be the best way to store embeddings in the filesystem I ended up in the design of VectorVFS.

timeline

The main goal of VectorVFS is to store the data embeddings (vectors) into the filesystem itself without requiring an external database. We don’t want to change the file contents as well and we also don’t want to create extra loose files in the filesystem. What we want is to store the embeddings on the files themselves without changing its data. How can we accomplish that ?

It turns out that all major Linux file systems (e.g. Ext4, Btrfs, ZFS, and XFS) support a feature that is called extended attributes (also known as xattr). This metadata is stored in the inode itself (ona reserved space at the end of each inode) and not in the data blocks (depending on the size). Some file systems might impose limits on the attributes, Ext4 for example requires them to be within a file system block (e.g. 4kb).

That is exactly what VectorVFS do, it embeds files (right now only images) using an encoder (Perception Encoder for the moment) and then it stores this embedding into the extended attributes of the file in the filesystem so we don’t need to create any other file for the metadata, the embedding will also be automatically linked directly to the file that was embedded and there is also no risks of this embedding being copied by mistake. It seems almost like a perfect solution for storing embeddings and retrieval that were there in many filesystems but it was never explored for that purpose before.

If you are interested, here is the documentation on how to install and use it, contributions are welcome !

 

Article, Machine Learning, Philosophy

Notes on Gilbert Simondon’s “On the Mode of Existence of Technical Objects” and Artificial Intelligence

Happy new year ! This is the first post of 2025 and this time it is not a technical article (but it is about philosophy of technology 😄)

Gilbert Simondon (1924-1989). Photo by LeMonde.

This is a short opinion article to share some notes on the book by the French philosopher Gilbert Simondon called “On the Mode of Existence of Technical Objects” (Du mode d’existence des objets techniques) from 1958. Despite his significant contributions, Simondon still (and incredibly) remains relatively unknown, and it seems to me that this is partly due to the delayed translation of his works. I realized recently that his philosophy of technology aligns very well with an actionable understanding of AI/ML. His insights illuminated a lot for me on how we should approach modern technology and what cultural and societal changes are needed to view AI as an evolving entity that can be harmonised with human needs. This perspective offers an alternative to the current cultural polarization between technophilia and technophobia, which often leads to alienation and misoneism. I think that this work from 1958 provides more enlightening and actionable insights than many contemporary discussions of AI and machine learning, which often prioritise media attention over public education. Simondon’s book is very dense and it was very difficult to read (I found it more difficult than Heidegger’s work on philosophy of technology), so in my quest to simplify it, I might be guilty of simplism in some cases.

(more…)