Microsoft Research launched Project Gecko, which is an initiative dedicated to creating cost-effective and customizable AI systems for the global majority. This project focuses on delivering vital expertise using local languages, culturally sensitive content and multimodal engagement through text, voice and video. It is a collaborative effort involving researchers from Microsoft Research Africa (Nairobi), Microsoft Research India, and the Microsoft Research Accelerator in the United States, along with Digital Green, which is a global development organization focused on community-driven digital infrastructure for agriculture.
A core innovation of Project Gecko is an AI system called MMCTAgent, which is a multimodal critical thinking agent framework. This system analyzes inputs from speech, images and videos to provide relevant, context-aware responses. MMCTAgent is available on Azure AI Foundry Labs and its code can be accessed on GitHub. This work aligns with Microsoft’s mission to empower everyone globally, emphasizing that developing equitable generative AI, which incorporates culturally nuanced experiences, is key to advancing AI responsibly and inclusively.
The project is working with FarmerChat, a speech-first AI assistant from Digital Green that advises millions of farmers with trusted agricultural recommendations. Digital Green has curated a library of more than 10,000 videos in more than 40 languages and dialects over two decades. Project Gecko’s goal was to evolve FarmerChat from a basic Q&A tool into a trusted farming companion. The team envisioned farmers submitting queries via speech or text and receiving actionable, step-by-step answers in their preferred language through text, voice, and a video that starts at the relevant solution.
The MMCTAgent framework is central to achieving this by enhancing experimental frontier models through domain-specific tools. It processes various information types and breaks down complex questions. It uses techniques like NLP and computer vision to better understand the videos and transcripts in the Digital Green library, making them accessible.
To overcome the lack of training data and computational resources for low-resource languages, the Project Gecko team is building new models from scratch, including automatic speech recognition (ASR) and text-to-speech (TTS) models. They also are utilizing small language models (SLMs), which require significantly less computing power than LLMs. This makes them easier to fine-tune for targeted domains and languages.
Looking ahead, Project Gecko plans to expand its impact into other domains, including healthcare, education and retail.
What This Means for ERP Insiders
ERP ecosystems need to support culturally adaptive, multimodal AI architectures. By prioritizing local languages, speech-first interfaces, and contextual content, the Project Gecko initiative signals a broader shift toward ERP platforms that must embed accessible, domain-sensitive intelligence. This creates opportunities for vendors and integrators to differentiate through inclusivity, affordable AI and regional knowledge integration.
MMCTAgent highlights a turning point in ERP modernization. Multimodal reasoning has become essential for operational decision supports. Its ability to analyze speech, text and video demonstrates how future ERP systems need deeper orchestration to manage unstructured content, domain tools and workflow intelligence. This will prompt enterprise architects to redesign integration patterns, metadata models and user-experience strategies around advanced cognitive capabilities.
Small language models and ASR/TTS technology offer an opportunity at the edge. By proving that low-resource environments can run targeted, efficient models, Project Gecko’s ability to run efficiently on a low-resource environment indicates a future where ERP solutions increasingly rely on domain-optimized, cost-effective AI components. This reduces risk, expands reach and enables industry-specific extensions.





