Projects

Flagship projects

Flagship project · 72 stars

Napolab

A benchmark for Portuguese language models that challenged assumptions about the advantage of Portuguese-only models and highlighted contamination and evaluation quality issues.

  • Backed by master's thesis at the University of Malta
  • Includes a live leaderboard on Hugging Face Spaces
  • Strong signal for evaluation rigor and benchmark design

Flagship project · 77 stars

Hashformers

Transformer and beam-search based hashtag segmentation system recognized as state of the art in LREC 2022.

  • Research-backed open-source package
  • Demonstrates applied NLP system design
  • Useful example of turning research into reusable tooling

Selected code

Supporting proof

COLIEE codebase

Supporting code for legal retrieval and legal entailment work, including a first-place zero-shot system.

More

Additional code, experiments, and open-source work live on GitHub.

Selected contributions

Argilla

Shipped semi-supervised learning related fixes and features in a production open-source NLP platform.

Hugging Face Transformers

Extended Trainer behavior for simultaneous Ray Tune and Weights & Biases execution.

BLINK

Fixed a benchmark script parameter bug in Meta AI's entity linking system.

Portuguese Word Embeddings

Corrected an evaluation bug later documented in published research.