Arabic Language Technologies

ALT group photo


At QCRI we are dedicated to promoting the Arabic language in the information age by conducting world-class research in Arabic language technologies.

Ensuring that the Arabic language flourishes in the digital world is a primary focal area of our research.  Some of our current research projects address the challenges related to lack of content and equally important, extracting that content. 

QCRI strives to become the regional and global leader in Arabic language technologies – in the areas of search, information retrieval and analysis, multilingual language processing, advanced machine translation and also leading efforts to increase and enrich Arabic language content online. 

We are working hard to help close the gap caused by the lack of valuable Arabic content on the web by engaging in efforts that increase and enrich this content.  Our partnership with Wikimedia Foundation marked the first step in this initiative. Through our collaboration with the Arabic Wikipedian community, the number of editors and their productivity has increased.  We are working with universities and educational institutions to integrate Wikipedia into the curricula, which will also help increase and enrich online content through increasing knowledge and consumption.  The initial goal is the addition of 10,000 Arabic articles of the highest quality. “Online content” is not just restricted to documents.  We are also working with YouTube/Google on video content, as well as social media platforms such as Twitter.  A critical component of the initiative is the creation of an outreach program to communicate with the Arabic Internet users to raise awareness and the delivery of the desired information.

QCRI’s initiatives do not only address the lack of content, it also addresses challenges in retrieving this content when it exists, making it accessible and enabling information flow across language barriers. In this regard, development is underway to process the Arabic language in the search domain such as the use of morphological word analysis, named entity recognitionand data learning technology to detect relevant content that can be used for more elaborate analysis. In addition, the development of proofing tools such as typographical checks and language identificationand the handling of different forms of the Arabic language in the form of local dialects and Arabic written using Latin characters.

A major effort at QCRI goes into improving machine translation for both text and speech.  Combining a “Speech-to-Text” engine that allows the instantaneous transcription of videos with machine translation system for dealing with the Arabic language allows access to broadcast news and news distributed over the web.  Future research will concentrate on applications such as lecture translation.

With our work in search and information retrieval, we have developed services that go beyond basic search functionality thus enabling a more exploratory search and in turn, better analytics of search results.  We have built search functionality that is more scalable and more language-aware.  Much of our work has been done in the social media domain, yet is transferable to other domains.  Our expertise in natural language processing and machine translation has helped build the foundation for this research. 

Bridging a gap identified in the education domain, we have established projects related to e-education, enabling people to access and learn material in a language not native to their own.  The development of an e-book reader with native Arabic support for the Arabic language, as well as an assistive language tutor are examples of such tools that will have an immediate impact on society and learning.

We have worked closely and collaborated with many local and international organizations including Al Jazeera, MIT and the Qatar Supreme Education Council on our projects.

Some of achievements and focus areas include:

  • Arabic speech recognition and understanding in formal Arabic الفصحى, in various colloquial Arabic dialects للهجات العامية, and in mixtures of these.
  • Machine translation of non-Arabic content (news, scientific articles, etc), and making it available on the web for easier access to Arabic speakers.
  • Arabic information storage and retrieval including key-word and semantic content indexing, search, summarization, and understanding.
  • Multilingual search involving on-the-fly translation of non-Arabic content in response to queries in Arabic.
  • Creation of computational language models for Modern Standard Arabic suitable for algorithmic manipulation in support of the above activities.
  • Development of Arabic language tutoring systems to teach Arabic to native speakers (K-12 students) as well as to professionals whose native language is not Arabic.

For technical or informational questions, please send an email to QCRI Careers with the name of the group to whom you’re directing your question, e.g. ALT, CS&E, Cyber Security, Data Analytics, Distributed Systems or Social Computing, in the subject line.

Research Director

S Vogel

Dr. Stephan Vogel

Being part of a research institute in start-up mode, helping to build a strong team doing world class research, and at the same time experiencing a different environment in terms of culture and language, geography and climate.
Read more

Principal Scientist

Dr. Lluis Marquez

Coming to QCRI represents facing very exciting research challenges, which are tightly connected to real applications that can impact people interaction and access to information; Contributing to QCRI’s ambitious goals surrounded by a strong team of researchers in an international and dynamic work environment is a real privilege
Read more

Principal Scientist

Fabrizio Sebastiani

With very strong research groups in (among others) language technologies, data analytics, and social computing, QCRI is a data scientist’s heaven and an opportunity to be at the forefront of today’s research.
Read more

Principal Scientist

Dr. Kemal Oflazer

QCRI offers an excellent chance to collaborate with top researchers in language technologies and I hope to contribute to this
Read more

Principal Scientist

Dr. Alessandro Moschitti

QCRI is an advanced research center focused on modern ICT areas: it has the potential to become a top ranked center for such technologies.
Read more
our-research/arabic-language-technologies
  • ALT Brochure
  • QATS:  QCRI Advanced Transcription System, a state-of-the-art speech recognition system for Modern Standard Arabic, is now live on Aljazeera.net!  Select daily or archived videos and turn on the closed caption feature.
default

Follow Us

  • YouTube
  • Twitter
  • Facebook
  • RSS Feed
  • Linkedin
  • github-web.png
Back to Top

In the Media

The FOundation.jpg

A Digital Companion

03/02/2016

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} QCRI's Jalees e-book platform is changing how Arabic ...

Read More

Huff Post.jpg

How Digital Humanitarians Are Closing the Gaps In Worldwide Disaster Response

01/02/2016

It is now commonplace for people around the world to use social media during emergencies, and the volume of online information coupled with its rapid arrival is becoming increasingly overwhelming to ...

Read More

Peninsulalogoforweb.jpg

CMU-Q to host regional 24-hour hackathon

21/01/2016

DOHA: Carnegie Mellon University in Qatar (CMU-Q) will host its first regional 24-hour hackathon from Friday at the university’s campus in Education City.  The CarnegieApps Hackathon is an annual ...

Read More

Upcoming Events

2016

Default Thumbnail

Machine Learning and Data Analytics Symposium - MLDAS 2016

Download ICS File 14/03/2016  - 15/03/2016 ,

Machine Learning and Data Analytics Symposium - MLDAS 2016 Building on the success of MLDAS 2015 and MLDAS 2014 , The Third Machine Learning and Data Analytics (MLDAS) Symposium , will be held on ...

Read More

CSAIL Logo 226.png

QCRI-MIT CSAIL Annual Meeting 2016

Download ICS File 20/03/2016 ,

Open invitation to attend the annual research project review meeting by QCRI and MIT- CSAIL. Executive overview sessions will highlight our eight main collaborative projects: Understanding Health ...

Read More

Rus QCRI WEB.JPG

Self-Driving Cars Are Coming - Public Talk by Daniela Rus

Download ICS File 20/03/2016 ,

Abstract: We spend a lot of time in out cars, yet this is a part of our lives where we have been vulnerable to the world's leading cause of bodily harm. Now, the digitization of practically ...

Read More

News Releases

HBKU-logo-final-(2).jpg

QCRI Humanitarian Technology Becomes First from the Middle East to Win the Open Source Software World Challenge Grand Prize

07/12/2015

Doha, December 6, 2015 – Qatar Computing Research Institute, one of Hamad bin Khalifa University’s three specialized national research institutes, recently won the esteemed Open Source Software World...

Read More

1 Lunch and Learn.jpg

Qatar Computing Research Institute Welcomes New Batch of Students to Summer Internship Programme

02/06/2015

Hands-On Programme Offers Undergraduate Students An Opportunity To Conduct Research And Gain Real-World Experience Doha, Qatar, 02 June 2015 - Enjoying its fourth consecutive year of success, the ...

Read More

Farnam Jahanian - Copy.JPG

Dr Farnam Jahanian joins Qatar Computing Research Institute's Scientific Advisory Committee

27/05/2015

The Provost of Carnegie Mellon University Brings A Wealth Of Knowledge And Expertise To Qatar Foundation-Based Research Institute

Read More