Blog posts

2025

5 minute read

Published: July 28, 2025

There are two systematic issues with benchmarks, which were already present in the community before the advent of LLMs but became worse …

1 minute read

Published: February 21, 2024

Today, Google launched a new set of models named Gemma. These models are based on the same tech and research used for creating the Gemini…

less than 1 minute read

Published: September 11, 2023

Napolab is here: a curated collection of Portuguese datasets designed for easy evaluation of language models.

less than 1 minute read

Published: June 03, 2023

Hashtag segmentation, the task of adding spaces between words in a hashtag, can now be done with Large Language Models (LLMs).

3 minute read

Published: May 31, 2023

It’s no secret that the tech landscape is dynamic and ever-evolving. New technologies are born, they mature, and then, often, they are…

2 minute read

Published: May 20, 2023

Abusive language detection, a critical aspect of modern NLP research, is often challenged by the lack of generalization across different…

less than 1 minute read

Published: May 18, 2023

I am thrilled to announce the latest milestone in the advancement of Portuguese language technology — the Albertina PT ! This breakthrough…

2 minute read

Published: March 09, 2022

Word segmentation is the task of adding spaces between words. It can be an important preprocessing step in Natural Language Processing…

less than 1 minute read

Published: February 12, 2022

The cold start problem in NLP:

7 minute read

Published: February 02, 2022

A calendar in the Wake

2 minute read

Published: February 02, 2022

Update 03/21/2021: I published my modified version of run_glue.py as a public gist on GitHub.