Retrospect of ICPADS 2023

During December 17-21, 2023, I attended ICPADS 2023 in Haina, China. ICPADS is an academic forum for parallel and distributed systems. I was fortunate to present my first author paper, “Network Contention-Aware Cluster Scheduling with Reinforcement Learning” (paper, long, code).

Getting there

The conference site was Ocean Flower Island, located in the far south of Hainan. It was a 256 km haul from Sanya airport, taking three hours by train. This was my second trip to mainland China (the first being to Shanghai and Hangzhou), and I felt quite confident because of my three years of Chinese education at a foreign language high school. However, I quickly realized that the southern dialect is completely different from the Putonghua spoken in northern cities like Beijing. After struggling to communicate with a station staff member - kind and generous, but unfortunately difficult to understand - I finally managed to buy a ticket to Baimajing, the nearest railway station to Ocean Flower Island. Upon arriving at Baimajing station, I somehow carpooled with a Chinese couple to my Airbnb. There, I accepted the fact that this trip would be dynamic!

text alt
text alt
text alt
text alt
Figure 1. Sanya railway station, me at the train, me taken to the airbnb by a golf cart, and me finally at the airbnb.

Ocean Flower Island is an artificial archipelago with numerous hotels, restaurants, and tourist sites. As I recall, the island is divided into two sectors: one primarily for tourists and the other for residents. Since my Airbnb was located in the residential sector, I took a motorbike taxi every morning to reach the conference site. I expected the weather to be extremely hot, but it was actually quite cold and rainy. I had to rely on the only hoodie I brought, and I eventually caught a cold the next day. However, the food was excellent, and the night view on the island was stunning. I sometimes walked back to my Airbnb along the coastline, enjoying the night scenery.

text alt
text alt
Figure 2. Motorbike taxi, my main transport for commuting to the conference site.
text alt
text alt
text alt
text alt
Figure 3. Night views and foods of Ocean Flower Island.

Conference

At the conference, I presented my work DeepShare. It is a GPU cluster scheduler that uses reinforcement learning for job sceduling. Training deep learning models requires deploying the job using parallelism strategies across multiple GPUs, and I observed that distributed DL training on shared GPU clusters is prone to network contention between training jobs. This is because existing schedulers mainly focus on allocation of dedicated computation resources (e.g., GPU) but are often agnostic to shared network resources (e.g., PCIe, NVLink, and Infiniband). This can be addressed by incorporating a contention-aware scheduler that dynamically schedules and migrates jobs according to cluster-wide network contention. DeepShare presents an end-to-end system for training such efficient scheduling policies with RL to its deployment on GPU clusters. Scheduling policies trained with DeepShare show that training latency is improved by up to \(20.7\%\) compared to other heuristics such as Least Attained Service (LAS) and Shortest Remaining Time First (SRTF).

my alt text
Figure 4. Illustration of DeepShare.

This was my first time presenting my work at an international conference. Before attending, a postdoctoral researcher in my group emphasized the importance of the “research community.” To be honest, at the time, I didn’t fully understand why maintaining a strong community is important for individual researchers. However, I was both surprised and grateful for the attention my work received. Other researchers engaged deeply with my presentation, asking detailed questions and discussing future directions. I was also intrigued by the work of others and realized that many people share similar interests to mine. After attending the conference, I came to appreciate the value of maintaining a healthy research community: one that is willing to review and provide feedback on others’ work, collaborate across disciplines, and collectively push the boundaries of technology. I made many friends during the breaks and at the banquet. It was very fun to chat about their life in grad school and in their home town. I realized that attending the conferences is going to be a valuable experience throughout my academic career.

text alt
text alt
text alt
text alt
Figure 5. Me during poster session and with friends met at ICPADS.

After the conference, I return back to Sanya and spent few days there as a vacation. Although short, I relaxed and revitalized to start researching again. This trip to ICPADS and Hainan, I believe, will be in my memories for a long time.

text alt
text alt
Figure 6. Me during vacation at Sanya.