Dr. Francisco Guzmán

Scientist
Arabic Language Technologies
+(974) 4454 1224
I was excited to take part on the foundation of this new Institute. Helping to create something from scratch appealed to me. I already knew some of the talented people at QCRI, and could tell that something grand was brewing. Also, living in the intriguing Middle East, learning a new language, interacting with a new culture, that was a plus.

Research Focus at QCRI

Dr. Francisco Javier Guzmán Herrera is a Scientist working for the Arabic Language Technologies department at the Qatar Computing Research Institute (QCRI). His field of work is set at the intersection of Language Technologies and Machine Learning. He has extensive experience in the field of Machine Translation, where he  started working since 2006. His research has been published in top-tier venues such as the Association of Computational Linguistics (ACL), the international Conference on Empirical Methods for Natural Language Processing (EMNLP), International Conference on Computational Linguistics (COLING), among others. Together with his team, he has participated in several international machine translation competitions such as: Seventh Workshop on Statistical Machine Translation (WMT) 2012, International Workshop on Spoken Language Translation (IWSLT) 2013, National Institute of Standards and Technology (NIST) Open MT Evaluation 2015, consistently obtaining top rankings for Arabic-English and Spanish-English language pairs. In 2014, he obtained the “Best in show” award at the BBC’s NewsHack III – Language Tech event in London, U.K., where international teams competed to have the best language-technologies application for news.  His research has pushed the boundaries of the field of machine translation evaluation using discourse information. In 2014, his team metric (discotk) won the WMT2014 metrics evaluation campaign. 

Dr. Guzmán enjoys coaching young minds. In the summer of 2012, he and Dr. Stephan Vogel started the first “Hot Summer/Cool research” internship program, to introduce undergraduate students from local universities to the research world. Since then, he has mentored more than dozen students, challenging them with research questions related to language technologies.

 

Previous Experience

Before joining QCRI, he collaborated in teams dealing with Speech Technology (Tecnológico de Monterrey, 2011) and Machine Translation (Carnegie Mellon University, 2008-2011). He obtained his PhD from the Instituto Tecnológico y de Estudios Superiores de Monterrey, in Mexico; and was a visiting scholar at the Language Technologies Institute at Carnegie Mellon University from 2008-2009, where he  took part of DARPA’s GALE evaluation program, under the Rosetta (IBM) consortium..

Professional Experience


Professional Associations and Awards

Awards:
  • Best in show award. 2014. We won the "Best in show" award for our Speech-to-Speech Translation demo at the BBC NewsHack III Hackathon in London, UK.
  • Best metric in competition. 2014. Our Machine Translation evaluation metric discoTK won the first place in the WMT2014 Metrics task.
  • Best system for Arabic English. 2013. Our Arabic-English and English-Arabic systems won the first place at IWSLT2013 translation tasks according to official metrics.
  • Best unconstrained system Spanish-English. 2012. Our Spanish-English system was the best performing unconstraind system according to human evaluation metrics at WMT 2012.
  • Best poster award. 2007. For CiCLing paper Using Translation Paraphrases from Trilingual Corpora to Improve Phrase-Based Statistical Machine Translation: A Preliminary Report with Leonardo Garrido.
  • EPF- Special Jury Mention. 2004.
  • The Washington Center for Internships and Seminars. NAFTA Leader's Program Scholarship. 2004
  • General Electric Foundation Scholarship. 2000 – 2004.

Education

  • PhD in Information Technology and Communications, Tecnológico de Monterrey, Mexico. 2011.
  • Master's Degree in Engineering (DDI-Double Dipôme), EPF-Écoled’Ingénieurs, France. 2004.
  • Physics Engineering, Tecnológico de Monterrey, Mexico. 2004.

Selected Research

  • Learning to Differentiate Better from Worse Translations; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, Alessandro Moschitti, Preslav Nakov, and Massimo Nicosia. In  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214-220, 2014.
  • Using Discourse Structure Improves Machine Translation Evaluation; Francisco Guzmán, Shafiq Joty, Lluís Màrquez, and Preslav Nakov. In  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14), pages 687-698, 2014
  • The AMARA Corpus: Building Resources for Translating the Web's Educational Content ; Francisco Guzmán, Hassan Sajjad, Ahmed Abdelali, and Stephan Vogel. In  Proceedings of the 10th International Workshop on Spoken Language Translation (IWSLT'13), 2013.
  •  A Tale about PRO and Monsters ; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL'13), pages 12-17, 2013.
  • Optimizing for Sentence-Level BLEU+1 Yields Short Translations; Preslav Nakov, Francisco Guzmán, and Stephan Vogel. In  Proceedings of the 24rd International Conference on Computational Linguistics (COLING 2012), pages 1979–1994, 2012.
  • Word Alignment Revisited; Francisco Guzmán, Qin Gao, Jan Niehues, and Stephan Vogel. In Handbook of Natural Language Processing and Machine Translation: DARPA global autonomous language exploitation , pages 164-175. Joseph Olive, Caitlin Christianson, and John McCary (Eds). Springer Science & Business Media. 2011.
  • Reassessment of the role of phrase extraction in pbsmt; Francisco Guzmán, Qin Gao and Stephan Vogel. In  Machine Translation Summit XII, 2009.
  • Experiments in Spanish-English and German-English Machine Translation of News Text; Francisco Guzman, Preslav Nakov, Ahmed Thabet, Stephan Vogel ,QCRI at WMT12: , WMT, 2012.
  • Word Alignment Revisited; Francisco Guzman, Qin Gao, Jan Niehues, Stephan Vogel, in Handbook of Natural    Language Processing and Machine Translation, edited by Joseph Olive, 2011.
  • EMDC: A Semi-supervised Approach for Word Alignment; Qin Gao, Francisco Guzman, Stephan Vogel, pp. 349-357, in Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). August 2010.
  • Reassessment of the Role of Phrase Extraction in SMT; Francisco Guzman, Qin Gao, Stephan Vogel, MT Summit XII, Ottawa, Canada. August 2009.
  • Translation Paraphrases for Phrase-Based Statistical Machine Translation. Computational Linguistics and Intelligent Text Processing; Francisco Guzman and Leonardo Garrido, 9th International Conference, CICLING 2008. Haifa, Israel. February 2008.


 

Follow Us

  • YouTube
  • Twitter
  • Facebook
  • RSS Feed
  • Linkedin
  • github-web.png
Back to Top

In the Media

algorithm bridge.jpg

Algorithm can create a bridge between Clinton and Trump supporters

20/02/2017

A growing number of people have expressed their concern about high levels of polarization in society. For instance, the World Economic Forum's report on global risks lists the increasing societal ...

Read More

Yahoo Tech.JPG

The hero big data needs? Data Civilizer helps scientists conquer the clutter

29/01/2017

Big data is a big deal. With these huge data sets, analysts can gain unprecedented insight into the hidden patterns of fields like physics, healthcare, and finance. Collecting and analyzing this data...

Read More

MIT Tamer.JPG

Taming data

22/01/2017

The age of big data has seen a host of new techniques for analyzing large data sets. But before any of those techniques can be applied, the target data has to be aggregated, organized, and cleaned up...

Read More

Upcoming Events

2017

MLDAS 2017

(MLDAS 2017) Machine Learning and Data Analytics Symposium

Download ICS File 13/03/2017  - 14/03/2017 , Qatar National Convention Centre

Machine Learning and Data Analytics Symposium - MLDAS 2017 Building on the success of the three previous events , Boeing and QCRI will hold the Fourth Machine Learning and Data Analytics Symposium (...

Read More

Past Events

ArabWic for web.jpg

Women in Data Science

Download ICS File 03/02/2017 ,

Here's a great chance to learn about the latest data science-related research in multiple domains, as part of a global project. Qatar's WiDS event will be held here at the HBKU Research Complex on ...

Read More

2016

QCRI IBM New.JPG

QCRI - IBM Data Science Connect 2016

Download ICS File 16/11/2016 ,

QCRI–IBM Data Science Connect 2016  Doha, Qatar 12.30pm –5:30pm, Wednesday, November 16 HBKU Research Complex, Ground Level Multi-Purpose Room Google Map link to location https://goo.gl/maps/...

Read More

News Releases

MLDAS 2016.JPG

Boeing Partners with QCRI for fourth annual Machine Learning and Data Analytics Symposium (MLDAS)

09/02/2017

The Boeing Company has announced that it will once again partner with the Qatar Computing Research Institute (QCRI), part of Hamad bin Khalifa University, to host the fourth annual Machine Learning ...

Read More

Jalees10.jpg

QCRI’s Jalees Reader app launched in more languages

06/12/2016

French and German interfaces added for free app which allows users to upload books and read them offline.

Read More

IBM Watson robot (ex IBM Watson).JPG

IBM Watson scientist visits Qatar to present platform that 'thinks like a human'

16/11/2016

IBM Watson’s chief data scientist Romeo Kienzler has visited the Qatar Computing Research Institute to conduct a workshop on Watson, a question-answering platform that can “think like a human”. Mr ...

Read More