Hamza Harkous profile

harkous@home:~ Hamza Harkous

Staff Research Scientist at Google

Synthetic Data, Agentic AI & Safety

Summary

cat ./summary.md

A 'zero-to-one' Staff Research Scientist who takes AI products from conception to public launch. I own the full project lifecycle, from core research and prototyping to hands-on development and strategic planning. Deep expertise in synthetic data generation, agentic AI, natural-language processing, and applying ML to privacy and safety, with high agency across the full stack.

Latest News

tail -f /var/log/news.log

[2025-05-01] Published workshop paper on Simula at ICLR.

Experience

history | grep "work"

Nov 2023 – Present

Staff Research Scientist, Google

  • Founder, Core Developer & Scientific Lead, Simula: Founded and serving as the lead developer for Simula, a multi-step, multimodal, agentic framework for synthetic-data generation. Simula is regularly used by over 180 monthly active Googlers across tens of teams, who have generated hundreds of millions of data points. It has powered tens of internal and external launches, including:
    • Powering Gemini safety classifiers across server and on-device versions.
    • Core to pre-launch evaluation and red-teaming efforts across GenAI features in Gemini app, Workspace, Photos, Cloud etc.
    • The primary data engine for ShieldGemma 1 and 2, Google's open-weights text and image safety models.
    • A key data source for Android's real-time call scam detection and messaging scam detection.
    • and more...

Nov 2021 – Oct 2023

Senior Research Scientist, Google

  • Founder, Internal Data-Curation Platform: Bootstrapped an internal web platform combining diversified retrieval, active learning, and LLM assistance. I grew the contributing team to 13 engineers, evolving the project into a foundational data engine for numerous modeling initiatives.
  • ML Lead & Architect, Google Checks: Continued to lead the ML strategy for Google's AI-powered privacy compliance platform, scaling the models to secure thousands of mobile apps.

Feb 2020 – Oct 2021

Research Scientist, Google

  • ML Architect, Google Checks: Architected the initial ML models for Google's privacy compliance platform. I designed the entire ML pipeline, including data labeling, model pre-training, and distillation.
  • Lead Researcher & Developer, Hark: Built the core ML models and infrastructure for a large-scale privacy-feedback analytics system used daily by 300+ triagers and processing tens of millions of reviews.

Jul 2019 – Jan 2020

Applied Scientist, Amazon Alexa

Developed DATATUNER, a neural data-to-text generation system with state-of-the-art semantic fidelity.

Nov 2018 – May 2019

Machine-Learning & Privacy Consultant, Privately SA

Shipped on-device classifiers for hate-speech, toxicity, and emotion detection; technology launched in a BBC-branded mobile keyboard.

Jul 2017 – Sep 2018

Post-doctoral Researcher, EPFL (LSIR-Lab)

Lead author and developer of Polisis, an AI tool that analyzed privacy policies for >45,000 users. The project was featured in major publications like Wired and WSJ.

Professional Highlights

git praise --all

  • Transformative Impact: Received Google's highest "Transformative" performance rating (top 4%) for the cross-organizational impact of the Simula project (2024).
  • 3-Time Google Core Tech Impact Award Winner: Received three separate awards for a primary/lead role in three independent, high-impact projects: Checks, Simula, and a new data-curation platform (awarded to the top 5% of projects).
  • Top Code Contributor: #1 Top code contributor at Privacy, Safety, & Security Research in Google (2023-2025).

Education

less /etc/education_history

  • Ph.D. Computer, Communication & Information Sciences

    EPFL

    Thesis: “Data-Driven, Personalized, Usable Privacy

  • M.Sc. Communication Systems

    EPFL

  • B.E. Computer & Communications Engineering

    American University of Beirut (Minor: Mathematics)

Awards & Recognition

cat /var/log/awards.log

  • Caspar Bowden Award, PETS (2017)
  • ISSS Excellence Award for Best Ph.D. Thesis, Switzerland (2017)
  • Outstanding Paper Award, ACM CODASPY (2017)
  • Best Dataset Award, ACM IMC (2015)
  • 4-Year Merit Scholarship, AUB (2006-2010)
  • Dean’s Award for Creative Achievement, AUB (2010)

Selected Publications

ls -la /var/log/publications.log

Patents

find / -name "*.patent" -print