Summary
cat ./summary.md
A 'zero-to-scale' Senior Staff Research Scientist working at the intersection of synthetic data, environments' simulation, and agents. I own the full project lifecycle, with high agency across the full stack.
Latest News
tail -f /var/log/news.log
[2025-05-01] Published workshop paper on Simula at ICLR.
Experience
history | grep "work"
Senior Staff Research Scientist, Google
Nov 2025 – Present
Staff Research Scientist, Google
Nov 2023 – Oct 2025
- Founder & Lead, Simula Framework: Co-founded and grew Google's internal leading synthetic data framework (with total of 2 tech FTEs) to serve 450 monthly active Googlers (>1k unique users in 2025). Enabled the generation of >1 billion data items while achieving the highest satisfaction score (93%) across all Google data tools.
- Creator, Simula Agent: Built the most widely used internal data-generation agent (since 2024). Currently serving >200 monthly active users, the agent auto-generates hundreds of end-to-end Colabs per month, specialized for Googlers' data generation needs.
- Gemma Ecosystem: Simula served as a primary data engine for the Gemma family, including the upcoming Gemma models, ShieldGemma, FunctionGemma, and MedGemma.
- Gemini Ecosystem: The framework contributed to frontier model development and safety:
- Safety & Security: Powering all safety classifiers (server-side & on-device); contributed to Cybersecurity, Red-Teaming, and Prompt Injection defenses.
- Distillation & Features: Supporting Gemini Flash Lite distillation, Nano (e.g., i18n post-training), and Gemini app features.
- Evaluations: Enabled evaluations in specialized Gemini verticals.
- Wider Impact: Instrumental in public launches, such as Android Call Scam and Messages Spam detection. The framework is also integrated into Vertex AI's GenAI Evaluation Service.
Nov 2021 – Oct 2023
Senior Research Scientist, Google
- Founder, Internal Data-Curation Platform: Bootstrapped an internal web platform combining diversified retrieval, active learning, and LLM assistance. I grew the contributing team to 13 engineers, evolving the project into a foundational data engine for numerous modeling initiatives.
- ML Lead & Architect, Google Checks: Continued to lead the ML strategy for Google's AI-powered privacy compliance platform, scaling the models to secure thousands of mobile apps.
Feb 2020 – Oct 2021
Research Scientist, Google
- ML Architect, Google Checks: Architected the initial ML models for Google's privacy compliance platform. I designed the entire ML pipeline, including data labeling, model pre-training, and distillation.
- Lead Researcher & Developer, Hark: Built the core ML models and infrastructure for a large-scale privacy-feedback analytics system used daily by 300+ triagers and processing tens of millions of reviews.
Jul 2019 – Jan 2020
Applied Scientist, Amazon Alexa
Developed DATATUNER, a neural data-to-text generation system with state-of-the-art semantic fidelity.
Nov 2018 – May 2019
Machine-Learning & Privacy Consultant, Privately SA
Shipped on-device classifiers for hate-speech, toxicity, and emotion detection; technology launched in a BBC-branded mobile keyboard.
Professional Highlights
git praise --all
- Transformative Impact: Received Google's highest "Transformative" performance rating (top 4%) for the cross-organizational impact of the Simula project (2024).
- 3-Time Google Core Tech Impact Award Winner: Received three separate awards for a primary/lead role in three independent, high-impact projects: Checks, Simula, and a new data-curation platform (awarded to the top 5% of projects).
- Top Code Contributor: #1 Top code contributor at Privacy, Safety, & Security Research in Google (2023-2025).
Education
less /etc/education_history
-
Ph.D. Computer, Communication & Information Sciences
EPFL
-
M.Sc. Communication Systems
EPFL
-
B.E. Computer & Communications Engineering
American University of Beirut (Minor: Mathematics)
Awards & Recognition
cat /var/log/awards.log
- Caspar Bowden Award, PETS (2017)
- ISSS Excellence Award for Best Ph.D. Thesis, Switzerland (2017)
- Outstanding Paper Award, ACM CODASPY (2017)
- Best Dataset Award, ACM IMC (2015)
- 4-Year Merit Scholarship, AUB (2006-2010)
- Dean’s Award for Creative Achievement, AUB (2010)
Selected Publications
ls -la /var/log/publications.log
- Orchestrating Synthetic Data with Reasoning
SynthData at ICLR 2025.
- ShieldGemma: Generative AI Content Moderation Based on Gemma
2024.
- Automated Cookie Notice Analysis and Enforcement
USENIX Security 2023.
- Hark: A Deep Learning System for Navigating Privacy Feedback at Scale
USENIX Security 2022.
- PriSEC: A Privacy Settings Enforcement Controller
USENIX Security 2021.
- Have Your Text and Use It Too! End-to-End Neural Data-to-Text Generation with Semantic Fidelity
COLING 2020.
- The Privacy Policy Landscape After the GDPR
PoPETS 2019
- Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning
USENIX Security 2018.
- If You Can’t Beat Them, Join Them: A Usability Approach to Interdependent Privacy in Cloud Apps
CODASPY 2017.
- The Curious Case of the PDF Converter that Likes Mozart
PETS 2016.