Projects

Flagship projects

Flagship project · 72 stars

Napolab

A benchmark for Portuguese language models that challenged assumptions about the advantage of Portuguese-only models and highlighted contamination and evaluation quality issues.

Backed by master's thesis at the University of Malta
Includes a live leaderboard on Hugging Face Spaces
Strong signal for evaluation rigor and benchmark design

Flagship project · 77 stars

Hashformers

Transformer and beam-search based hashtag segmentation system recognized as state of the art in LREC 2022.

Research-backed open-source package
Demonstrates applied NLP system design
Useful example of turning research into reusable tooling

Selected code

Supporting proof

COLIEE codebase

Supporting code for legal retrieval and legal entailment work, including a first-place zero-shot system.

Additional code, experiments, and open-source work live on GitHub.

Selected contributions

Argilla

Shipped semi-supervised learning related fixes and features in a production open-source NLP platform.

Hugging Face Transformers

Extended Trainer behavior for simultaneous Ray Tune and Weights & Biases execution.

BLINK

Fixed a benchmark script parameter bug in Meta AI's entity linking system.

Portuguese Word Embeddings

Corrected an evaluation bug later documented in published research.