AI Researcher and Software Engineer
I lead cutting-edge research at the intersection of AI and software engineering, creating novel machine learning methods to automate code generation, bolster application security with advanced vulnerability detection, and empower developers with intelligent code‐analysis tools.
I am fascinated by how AI can bridge human intent and executable code—whether it's translating natural language into production-ready workflows or establishing benchmarks that ensure self-consistent, reliable code understanding. I believe in harnessing the transformative power of intelligent systems to tackle real-world challenges in software engineering, security, and developer productivity.
Years Research
Publications
Patents
Citations
Building deep representation and generative models that turn natural-language intent into accurate, maintainable code while enabling smarter search, refactoring, and repair.
Developing data-driven techniques and benchmarks that automatically surface, classify, and help remediate security flaws across large, real-world codebases.
Creating AI-augmented assistants and evaluations that measure, streamline, and elevate everyday software-engineering workflows from coding to deployment.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 2021
...In this paper, we present a large-scale dataset CodeNet, consisting of over 14 million code samples and about 500 million lines of code in 55 different programming languages, which is aimed at teaching AI to code. In addition to its large scale, CodeNet has a rich set of high-quality annotations to benchmark and help accelerate research in AI techniques for a variety of critical coding tasks, including code similarity and classification, code translation between a large variety of programming languages, and code performance (runtime and memory) improvement techniques. ...
2023 60th ACM/IEEE Design Automation Conference (DAC)
... The recent improvement in code generation capabilities due to the use of large language models has mainly benefited general purpose programming languages. Domain specific languages, such as the ones used for IT Automation, received far less attention, despite involving many active developers and being an essential component of modern cloud platforms. This work focuses on the generation of Ansible YAML, a widely used markup language for IT Automation. ...
2024 IEEE Symposium on Security and Privacy (SP)
... We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigative dimensions using our framework. Our evaluation shows LLMs provide non-deterministic responses, incorrect and unfaithful reasoning, and perform poorly in real-world scenarios. ...
60th Annual Meeting of the Association for Computational Linguistics, 2022
... we design structure-guided code transformation algorithms to generate synthetic code clones and inject real-world security bugs, augmenting the collected datasets in a targeted way. We propose to pre-train the Transformer model with such automatically generated program contrasts to better identify similar code in the wild and differentiate vulnerable programs from benign ones. ...
I'm always interested in research collaborations, academic discussions, and opportunities to advance the field of artificial intelligence. Feel free to reach out!
I'd love to hear from you! Fill out the form below to send me a message directly.