👋 Hi, I'm Ruan

Senior AI Engineer | Generative AI | RAG | LLMs | NLP | Python

🌐 Personal Website

🚀 Projects

  • ruanchaves/hashformers: State-of-the-art framework for hashtag segmentation.
    • Recognized as state-of-the-art at LREC 2022 in the paper “HashSet — A Dataset For Hashtag Segmentation” by researchers from IIT.
    • Leverages GPT-2 and beam search for accurate, multilingual hashtag and text segmentation.
    • Outperforms prior methods on standard benchmarks (STANsmall, BOUN).
  • ruanchaves/napolab: The Natural Portuguese Language Benchmark.
    • A curated collection of Portuguese datasets for rigorous LLM evaluation.
    • Key finding: the performance gap between general-purpose LLMs and Portuguese-specific models is smaller than previously believed—multilingual training is often sufficient.
    • Exposes systemic issues in LLM benchmarking (investigation gap, data contamination) that have inflated reported progress.
    • Browse the Napolab Leaderboard and stay up to date with the latest advancements in Portuguese language models.
    • Medium Article: The Hidden Truth About LLM Performance: Why Your Benchmark Results Might Be Misleading
      • Explores how benchmark contamination and investigation gaps have inflated reported LLM progress, with a focus on Portuguese language evaluation.
    • Master Thesis: Lessons learned from the evaluation of Portuguese language models
      • M.Sc. thesis (University of Malta, 2023). Traces Portuguese NLP from early embeddings to LLMs, presents Napolab as the main contribution, and concludes that multilingual models match Portuguese-specific ones.
Napolab Leaderboard Interface Model Performance Analysis

🌟 Contributions

Click on the links to view my pull requests.

  • argilla-io/argilla:
    • Fixed bugs and shipped features related to semi-supervised learning (SSL) during my internship at Argilla.
    • Context: Argilla is an open-source data curation and annotation platform for NLP. It was acquired by Hugging Face in June 2024 (~$10M deal) to strengthen their dataset tooling ecosystem.
  • huggingface/transformers:
    • Modified the Trainer class for simultaneous Ray Tune and Weights & Biases execution.
    • Context: Enabled parallel hyperparameter search with Ray Tune while logging all runs to W&B for experiment tracking—a common workflow for production model fine-tuning.
  • nathanshartmann/portuguese_word_embeddings:
    • Fixed a severe bug in the evaluation procedure.
    • Documented the bug fix in the research paper “Portuguese language models
”.
    • Context: This repository provides pre-trained word embeddings for Portuguese. The bug fix corrected an evaluation error that affected reported benchmark scores.
  • facebookresearch/BLINK:
    • Fixed a parameter bug in the script for the BLINK benchmark.
    • Context: BLINK is Meta AI’s state-of-the-art entity linking system, using a bi-encoder + cross-encoder architecture over BERT to link text mentions to Wikipedia entities at scale (millions of candidates in milliseconds).
  • awslabs/mlm-scoring:
    • Addressed an installation instruction issue for the mlm-scoring library.
    • Context: AWS Labs library for scoring sentences using masked language models (BERT, RoBERTa). Based on the ACL 2020 paper “Masked Language Model Scoring”—pseudo-log-likelihood scores outperform GPT-2 on acceptability judgments.

📖 Papers With Code


📧 Email: ruanchaves93@gmail.com