Dr. Francisco Guzmán

Arabic Language Technologies
+(974) 4454 1224
I was excited to take part on the foundation of this new Institute. Helping to create something from scratch appealed to me. I already knew some of the talented people at QCRI, and could tell that something grand was brewing. Also, living in the intriguing Middle East, learning a new language, interacting with a new culture, that was a plus.

Research Focus at QCRI

Dr. Francisco Javier Guzmán Herrera is a Scientist working for the Arabic Language Technologies department at the Qatar Computing Research Institute (QCRI). His field of work is set at the intersection of Language Technologies and Machine Learning. He has extensive experience in the field of Machine Translation, where he  started working since 2006. His research has been published in top-tier venues such as the Association of Computational Linguistics (ACL), the international Conference on Empirical Methods for Natural Language Processing (EMNLP), International Conference on Computational Linguistics (COLING), among others. Together with his team, he has participated in several international machine translation competitions such as: Seventh Workshop on Statistical Machine Translation (WMT) 2012, International Workshop on Spoken Language Translation (IWSLT) 2013, National Institute of Standards and Technology (NIST) Open MT Evaluation 2015, consistently obtaining top rankings for Arabic-English and Spanish-English language pairs. In 2014, he obtained the “Best in show” award at the BBC’s NewsHack III – Language Tech event in London, U.K., where international teams competed to have the best language-technologies application for news.  His research has pushed the boundaries of the field of machine translation evaluation using discourse information. In 2014, his team metric (discotk) won the WMT2014 metrics evaluation campaign. 

Dr. Guzmán enjoys coaching young minds. In the summer of 2012, he and Dr. Stephan Vogel started the first “Hot Summer/Cool research” internship program, to introduce undergraduate students from local universities to the research world. Since then, he has mentored more than dozen students, challenging them with research questions related to language technologies.


Previous Experience

Before joining QCRI, he collaborated in teams dealing with Speech Technology (Tecnológico de Monterrey, 2011) and Machine Translation (Carnegie Mellon University, 2008-2011). He obtained his PhD from the Instituto Tecnológico y de Estudios Superiores de Monterrey, in Mexico; and was a visiting scholar at the Language Technologies Institute at Carnegie Mellon University from 2008-2009, where he  took part of DARPA’s GALE evaluation program, under the Rosetta (IBM) consortium..

Professional Experience

Professional Associations and Awards

  • Best in show award. 2014. We won the "Best in show" award for our Speech-to-Speech Translation demo at the BBC NewsHack III Hackathon in London, UK.
  • Best metric in competition. 2014. Our Machine Translation evaluation metric discoTK won the first place in the WMT2014 Metrics task.
  • Best system for Arabic English. 2013. Our Arabic-English and English-Arabic systems won the first place at IWSLT2013 translation tasks according to official metrics.
  • Best unconstrained system Spanish-English. 2012. Our Spanish-English system was the best performing unconstraind system according to human evaluation metrics at WMT 2012.
  • Best poster award. 2007. For CiCLing paper Using Translation Paraphrases from Trilingual Corpora to Improve Phrase-Based Statistical Machine Translation: A Preliminary Report with Leonardo Garrido.
  • EPF- Special Jury Mention. 2004.
  • The Washington Center for Internships and Seminars. NAFTA Leader's Program Scholarship. 2004
  • General Electric Foundation Scholarship. 2000 – 2004.


  • PhD in Information Technology and Communications, Tecnológico de Monterrey, Mexico. 2011.
  • Master's Degree in Engineering (DDI-Double Dipôme), EPF-Écoled’Ingénieurs, France. 2004.
  • Physics Engineering, Tecnológico de Monterrey, Mexico. 2004.

Selected Research

  • Learning to Differentiate Better from Worse Translations; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, Alessandro Moschitti, Preslav Nakov, and Massimo Nicosia. In  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214-220, 2014.
  • Using Discourse Structure Improves Machine Translation Evaluation; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, and Preslav Nakov. In  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14), pages 687-698, 2014
  • The AMARA Corpus: Building Resources for Translating the Web's Educational Content ; Francisco Guzmán, Hassan Sajjad, Ahmed Abdelali, and Stephan Vogel. In  Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT'13), 2013.
  •  A Tale about PRO and Monsters ; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), pages 12-17, 2013.
  • Optimizing for Sentence-Level BLEU+1 Yields Short Translations; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 24rd International Conference on Computational Linguistics (COLING 2012), pages 1979–1994, 2012.
  • Word Alignment Revisited; Francisco Guzmán, Qin Gao, Jan Niehues, and Stephan Vogel. In Handbook of Natural Language Processing and Machine Translation: DARPA global autonomous language exploitation , pages 164-175. Joseph Olive, Caitlin Christianson, and John McCary (Eds). Springer Science & Business Media. 2011.
  • Reassessment of the role of phrase extraction in pbsmt; Francisco Guzmán, Qin Gao and Stephan Vogel. In  Machine Translation Summit XII, 2009.
  • Experiments in Spanish-English and German-English Machine Translation of News Text; Francisco Guzman, Preslav Nakov, Ahmed Thabet, Stephan Vogel ,QCRI at WMT12: , WMT, 2012.
  • Word Alignment Revisited; Francisco Guzman, Qin Gao, Jan Niehues, Stephan Vogel, in Handbook of Natural    Language Processing and Machine Translation, edited by Joseph Olive, 2011.
  • EMDC: A Semi-supervised Approach for Word Alignment; Qin Gao, Francisco Guzman, Stephan Vogel, pp. 349-357, in Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). August 2010.
  • Reassessment of the Role of Phrase Extraction in SMT; Francisco Guzman, Qin Gao, Stephan Vogel, MT Summit XII, Ottawa, Canada. August 2009.
  • Translation Paraphrases for Phrase-Based Statistical Machine Translation. Computational Linguistics and Intelligent Text Processing; Francisco Guzman and Leonardo Garrido, 9th International Conference, CICLING 2008. Haifa, Israel. February 2008.


Follow Us

  • YouTube
  • Twitter
  • Facebook
  • RSS Feed
  • Linkedin
  • github-web.png
Back to Top

In the Media

Luis Luque El Correo.jpg

Entrevista con Luis Fernández Luque, cofundador de Salumedia e investigador del Qatar Computing Research Institute


Si quiere buscar un ejemplo de ciudadano del mundo, de los que al cabo del año vive y trabaja desde numerosos países, y a través de internet, esté donde esté, desarrolla en remoto actividades para ...

Read More

Raghvendra science pic.jpg

Gene fusion shifts cell activity into high gear, causing some cancer


The fusion of two adjacent genes can cause cancer by kicking mitochondria into overdrive and increasing the amount of fuel available for rampant cell growth, researchers at Columbia University ...

Read More

BBC newshack pic.jpg

QCRI team wins SUMMA BBC #NewsHack 'Best Audience Experience' prize


Congratulations to Ahmed Ali, Yifan Zhang and Fahim Dalvi for winning the prize of the “Best Audience Experience” category at the news hack hosted by SUMMA and BBC in London on 21-22 November with ...

Read More

Upcoming Events


after school pic.JPG

QCRI's Creative Space launches free after-school computing courses for teenagers

Download ICS File 01/11/2017  - 20/12/2018 ,

We offer an App Inventor Course in Arabic for students aged 13-15 and an Arduino Programming Course in English for students aged 14-18. Courses are free. Please register quickly as places are limited.

Read More

Past Events

Summer Camp 2.jpg

QCRI conducts first summer computing camps for kids

Download ICS File 16/07/2017  - 27/07/2017 ,

Children and teenagers have been given a rare chance to develop their computing skills with world-class computing scientists at the first summer computing camp conducted by the Qatar Computing ...

Read More

CS 1.jpg

QCRI’s Creative Space holds Open House event for kids

Download ICS File 20/05/2017 ,

The Qatar Computing Research Institute’s new Creative Space, which conducts fun activities to teach children computing skills, has successfully held its first Open House event. About 100 children ...

Read More

News Releases

Jim Jansen pic preferred.jpg

Research by QCRI's Jim Jansen among most influential of decade: top journals


QCRI Social Computing group's principal scientist achieves rare honor.

Read More

yelena pic.JPG

#Halal now a lifestyle definition on Instagram


The word “halal” is no longer being defined only in a religious context but is becoming a lifestyle term associated with health and fashion around the globe, a new study of Instagram posts led by ...

Read More

Dr. Mokbel.jpg

QCRI’s Mohamed Mokbel named Distinguished Scientist by world’s largest computing society ACM


Chief scientist among only 43 scientists globally - and the only one from the Middle East - to be selected for the honor in 2017.

Read More