Summary
cat ./summary.md
A 'zero-to-one' Staff Research Scientist who takes AI products from conception to public launch. I own the full project lifecycle, from core research and prototyping to hands-on development and strategic planning. Deep expertise in synthetic data generation, agentic AI, natural-language processing, and applying ML to privacy and safety, with high agency across the full stack.
Latest News
tail -f /var/log/news.log
[2025-05-01] Published workshop paper on Simula at ICLR.
Experience
history | grep "work"
Nov 2023 – Present
Staff Research Scientist, Google
- Founder, Core Developer & Scientific Lead, Simula: Founded and serving as the lead developer for Simula, a multi-step, multimodal, agentic framework for synthetic-data generation. Simula is regularly used by over 180 monthly active Googlers across tens of teams, who have generated hundreds of millions of data points. It has powered tens of internal and external launches, including:
- Powering Gemini safety classifiers across server and on-device versions.
- Core to pre-launch evaluation and red-teaming efforts across GenAI features in Gemini app, Workspace, Photos, Cloud etc.
- The primary data engine for ShieldGemma 1 and 2, Google's open-weights text and image safety models.
- A key data source for Android's real-time call scam detection and messaging scam detection.
- and more...
Nov 2021 – Oct 2023
Senior Research Scientist, Google
- Founder, Internal Data-Curation Platform: Bootstrapped an internal web platform combining diversified retrieval, active learning, and LLM assistance. I grew the contributing team to 13 engineers, evolving the project into a foundational data engine for numerous modeling initiatives.
- ML Lead & Architect, Google Checks: Continued to lead the ML strategy for Google's AI-powered privacy compliance platform, scaling the models to secure thousands of mobile apps.
Feb 2020 – Oct 2021
Research Scientist, Google
- ML Architect, Google Checks: Architected the initial ML models for Google's privacy compliance platform. I designed the entire ML pipeline, including data labeling, model pre-training, and distillation.
- Lead Researcher & Developer, Hark: Built the core ML models and infrastructure for a large-scale privacy-feedback analytics system used daily by 300+ triagers and processing tens of millions of reviews.
Jul 2019 – Jan 2020
Applied Scientist, Amazon Alexa
Developed DATATUNER, a neural data-to-text generation system with state-of-the-art semantic fidelity.
Nov 2018 – May 2019
Machine-Learning & Privacy Consultant, Privately SA
Shipped on-device classifiers for hate-speech, toxicity, and emotion detection; technology launched in a BBC-branded mobile keyboard.
Professional Highlights
git praise --all
- Transformative Impact: Received Google's highest "Transformative" performance rating (top 4%) for the cross-organizational impact of the Simula project (2024).
- 3-Time Google Core Tech Impact Award Winner: Received three separate awards for a primary/lead role in three independent, high-impact projects: Checks, Simula, and a new data-curation platform (awarded to the top 5% of projects).
- Top Code Contributor: #1 Top code contributor at Privacy, Safety, & Security Research in Google (2023-2025).
Education
less /etc/education_history
-
Ph.D. Computer, Communication & Information Sciences
EPFL
-
M.Sc. Communication Systems
EPFL
-
B.E. Computer & Communications Engineering
American University of Beirut (Minor: Mathematics)
Awards & Recognition
cat /var/log/awards.log
- Caspar Bowden Award, PETS (2017)
- ISSS Excellence Award for Best Ph.D. Thesis, Switzerland (2017)
- Outstanding Paper Award, ACM CODASPY (2017)
- Best Dataset Award, ACM IMC (2015)
- 4-Year Merit Scholarship, AUB (2006-2010)
- Dean’s Award for Creative Achievement, AUB (2010)
Selected Publications
ls -la /var/log/publications.log
- Orchestrating Synthetic Data with Reasoning
SynthData at ICLR 2025.
- ShieldGemma: Generative AI Content Moderation Based on Gemma
2024.
- Automated Cookie Notice Analysis and Enforcement
USENIX Security 2023.
- Hark: A Deep Learning System for Navigating Privacy Feedback at Scale
USENIX Security 2022.
- PriSEC: A Privacy Settings Enforcement Controller
USENIX Security 2021.
- Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
COLING 2020.
- The Privacy Policy Landscape After the GDPR
PoPETS 2019
- Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning
USENIX Security 2018.
- If You Can’t Beat Them, Join Them: A Usability Approach to Interdependent Privacy in Cloud Apps
CODASPY 2017.
- The Curious Case of the PDF Converter that Likes Mozart
PETS 2016.