Yuxuan Zhang

I'm a sixth year PhD at CIS department of Penn. My advisor is Sebastian Angel.

I'm broadly interested in bridging the gap between data center applications and server processors by a intersection of techniques in OS, compilers and hardware.

I currently build systems that leverage both hardware and software techniques to improve application's performance at runtime.

Here are my research statement, and my CV.

Education

  • PhD in Computer and Information Science, University of Pennsylvania [...]
    • Ocolos: Online COde Layout OptimizationS
      • Built Ocolos, the first online code layout optimization system for unmodified applications written in unmanaged languages.
    • RPG2: Robust Profile-Guided Runtime Prefetch Generation
      • Built RPG2, a pure-software system that operates on running C/C++ programs, profiling them, injecting prefetch instructions, and then tuning those prefetches to maximize performance.
    • Quilt: Resource-aware Merging of Serverless Workflows
      • Built a serverless optimizer that automatically merges workflows composed of many functions—potentially written in different languages—into a single process, reducing invocation latency, communication overhead, and long chains of cold starts.
  • MS in Electrical Engineering, University of Michigan, Ann Arbor [...]
    • Two-way superscalar R10K Out-of-Order processor
      • Implemented 2-way associate non-blocking writeback data cache and its cache controller which maintains outstanding cache misses status.
      • Implemented key components such as Reservation Station, hardware register map table, Reorder Buffer, Load Store Queue of the OoO processor.
      • Modified visual debugging tools and re-design the testbench to support performance analysis of the OoO processor.
    • Design and Verify a Cache Coherency Protocol
      • Designed and verified an invalidation based MOESI self-downgrade cache coherence protocol for the multicore memory system by enumerative model checker Murphi.
    • Wikipedia Search Engine
      • Built a scalable search engine which supports information retrieval based on both tf-idf and PageRank scores.
      • Indexed webpages with Hadoop MapReduce framework to scale to large corpus sizes.
      • Built a new search engine interface with two special features: user-driven scoring and summarization.
  • BS in Electrical Engineering, Harbin Institute of Technology

Publications

  • RPG2: Robust Profile-Guided Runtime Prefetch Generation
    [paper] [code] [slides] [poster]
    Y. Zhang, N. Sobotka, S. Park, S. Jamilan, T. A. Khan, B. Kasikci, G. Pokam, H. Litz, J. Devietti
    Proc. International Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), May. 2024.
  • Online COde Layout OptimizationS via Ocolos
    [paper]
    Y. Zhang, T. A. Khan, G. Pokam, B. Kasikci, H. Litz, J. Devietti.
    IEEE Micro "Top Picks From the 2022 Computer Architecture Conferences", May. 2023.
  • OCOLOS: Online COde Layout OptimizationS
    [paper] [code] [slides]
    Y. Zhang, T. A. Khan, G. Pokam, B. Kasikci, H. Litz, J. Devietti.
    Proc. International Symposium on Microarchitecture (MICRO), Oct. 2022.

Employment History

  • Software Engineer Intern, Google LLC [...]
    • TI development Team, Sunnyvale, CA, 05.2025 - 08.2025
    • TBD
      • Analyze performance bottlenecks in Google’s datacenter workloads via profiles collected at runtime.
      • Implement synthetic workloads which yields the same performance metrics of real workloads by using Large Language Model
  • Software Engineer Intern, VMware [...]
    • Monitor Team, Boston, MA, 05.2022 - 08.2022
    • Prevalidation during Pre-copy of memory pages
      • Offloaded the pre-validation of the destination VM’s page table from Virtual Machine Monitor(VMM) to ESXi (VMKernel, the hypervisor) after a VM is migrated from source to destination (VMotion), in order to reduce the contention of updating page tables on different VMs.
      • Built prevalidation during the pre-copy of memory pages in VMotion to reduce the time spending on pre-validation.
  • Research Intern, Microsoft Research Asia [...]
    • Network Research Group, Beijing, China, 01.2018 - 07.2019
    • GLane on GPU
      • Built a Linux module that can expose an NVIDIA GPU’s physical memory for direct data transfer, and a hardware stack for GPUs in a device-centric cluster to buffer and transfer data.
      • Prototyped CUDA code to perform GPU computation and data transfer in parallel without host CPU involvement.
  • Software Engineer Intern, NVidia

Miscellaneous

  • I enjoy writing fan fictions (website in Chinese), and my self-published fan fictions have been sold more than 1,100 copies in China.
  • I’m also a Japanese language learner in UPenn Japanese Language Program. I passed JLPT N3 and this year I'm targeting on N2.
  • I had been learning Chinese dance for 7 years and playing the piano for 5 years, and received certificates from Beijing Dance Academy and Central Conservatory of Music

Copyright © 2015-2025・Yuxuan Zhang