Projects
Flagship projects
Napolab
A benchmark for Portuguese language models that challenged assumptions about the advantage of Portuguese-only models and highlighted contamination and evaluation quality issues.
- Backed by master's thesis at the University of Malta
- Includes a live leaderboard on Hugging Face Spaces
- Strong signal for evaluation rigor and benchmark design
Hashformers
Transformer and beam-search based hashtag segmentation system recognized as state of the art in LREC 2022.
- Research-backed open-source package
- Demonstrates applied NLP system design
- Useful example of turning research into reusable tooling
Selected code
COLIEE codebase
Supporting code for legal retrieval and legal entailment work, including a first-place zero-shot system.
More
Additional code, experiments, and open-source work live on GitHub.
Selected contributions
Argilla
Shipped semi-supervised learning related fixes and features in a production open-source NLP platform.
Hugging Face Transformers
Extended Trainer behavior for simultaneous Ray Tune and Weights & Biases execution.
BLINK
Fixed a benchmark script parameter bug in Meta AI's entity linking system.
Portuguese Word Embeddings
Corrected an evaluation bug later documented in published research.