Hamza Harkous profile

harkous@home:~ Hamza Harkous

Senior Staff Research Scientist at Google

Synthetic Data, Agentic AI & Safety

Summary

cat ./summary.md

A 'zero-to-scale' Senior Staff Research Scientist working at the intersection of synthetic data, environments' simulation, and agents. I own the full project lifecycle, with high agency across the full stack.

Latest News

tail -f /var/log/news.log

[2025-05-01] Published workshop paper on Simula at ICLR.

Experience

history | grep "work"

Senior Staff Research Scientist, Google

Nov 2025 – Present

Staff Research Scientist, Google

Nov 2023 – Oct 2025

  • Founder & Lead, Simula Framework: Co-founded and grew Google's internal leading synthetic data framework (with total of 2 tech FTEs) to serve 450 monthly active Googlers (>1k unique users in 2025). Enabled the generation of >1 billion data items while achieving the highest satisfaction score (93%) across all Google data tools.
  • Creator, Simula Agent: Built the most widely used internal data-generation agent (since 2024). Currently serving >200 monthly active users, the agent auto-generates hundreds of end-to-end Colabs per month, specialized for Googlers' data generation needs.
  • Gemma Ecosystem: Simula served as a primary data engine for the Gemma family, including the upcoming Gemma models, ShieldGemma, FunctionGemma, and MedGemma.
  • Gemini Ecosystem: The framework contributed to frontier model development and safety:
    • Safety & Security: Powering all safety classifiers (server-side & on-device); contributed to Cybersecurity, Red-Teaming, and Prompt Injection defenses.
    • Distillation & Features: Supporting Gemini Flash Lite distillation, Nano (e.g., i18n post-training), and Gemini app features.
    • Evaluations: Enabled evaluations in specialized Gemini verticals.
  • Wider Impact: Instrumental in public launches, such as Android Call Scam and Messages Spam detection. The framework is also integrated into Vertex AI's GenAI Evaluation Service.

Nov 2021 – Oct 2023

Senior Research Scientist, Google

  • Founder, Internal Data-Curation Platform: Bootstrapped an internal web platform combining diversified retrieval, active learning, and LLM assistance. I grew the contributing team to 13 engineers, evolving the project into a foundational data engine for numerous modeling initiatives.
  • ML Lead & Architect, Google Checks: Continued to lead the ML strategy for Google's AI-powered privacy compliance platform, scaling the models to secure thousands of mobile apps.

Feb 2020 – Oct 2021

Research Scientist, Google

  • ML Architect, Google Checks: Architected the initial ML models for Google's privacy compliance platform. I designed the entire ML pipeline, including data labeling, model pre-training, and distillation.
  • Lead Researcher & Developer, Hark: Built the core ML models and infrastructure for a large-scale privacy-feedback analytics system used daily by 300+ triagers and processing tens of millions of reviews.

Jul 2019 – Jan 2020

Applied Scientist, Amazon Alexa

Developed DATATUNER, a neural data-to-text generation system with state-of-the-art semantic fidelity.


Nov 2018 – May 2019

Machine-Learning & Privacy Consultant, Privately SA

Shipped on-device classifiers for hate-speech, toxicity, and emotion detection; technology launched in a BBC-branded mobile keyboard.


Jul 2017 – Sep 2018

Post-doctoral Researcher, EPFL (LSIR-Lab)

Lead author and developer of Polisis, an AI tool that analyzed privacy policies for >45,000 users. The project was featured in major publications like Wired and WSJ.

Professional Highlights

git praise --all

  • Transformative Impact: Received Google's highest "Transformative" performance rating (top 4%) for the cross-organizational impact of the Simula project (2024).
  • 3-Time Google Core Tech Impact Award Winner: Received three separate awards for a primary/lead role in three independent, high-impact projects: Checks, Simula, and a new data-curation platform (awarded to the top 5% of projects).
  • Top Code Contributor: #1 Top code contributor at Privacy, Safety, & Security Research in Google (2023-2025).

Education

less /etc/education_history

  • Ph.D. Computer, Communication & Information Sciences

    EPFL

    Thesis: “Data-Driven, Personalized, Usable Privacy

  • M.Sc. Communication Systems

    EPFL

  • B.E. Computer & Communications Engineering

    American University of Beirut (Minor: Mathematics)

Awards & Recognition

cat /var/log/awards.log

  • Caspar Bowden Award, PETS (2017)
  • ISSS Excellence Award for Best Ph.D. Thesis, Switzerland (2017)
  • Outstanding Paper Award, ACM CODASPY (2017)
  • Best Dataset Award, ACM IMC (2015)
  • 4-Year Merit Scholarship, AUB (2006-2010)
  • Dean’s Award for Creative Achievement, AUB (2010)

Selected Publications

ls -la /var/log/publications.log

Patents

find / -name "*.patent" -print