..

What I Believe In and Aspire to Build Over the Next Decade

#biased-thoughts

  1. Inference Must Be Cheap and Fast Enough

    The demand for AI model inference is growing exponentially. To address this, we must build efficient, cost-effective, and robust systems. The intersection of large language models (LLMs) and system optimization provides a unique opportunity to solve these challenges. By approaching it as an engineering problem, we can tackle it through hardware-software co-design and innovative programming models. Starting this optimization now is critical to making AI scalable and accessible across diverse use cases.

  2. The Importance of System 2 Thinking

    High-quality, diverse training data is essential for pretraining models, but alignment with preference datasets is equally critical. Furthermore, techniques like Monte Carlo Tree Search (MCTS) for exploration with Reinforcement learning (RL) during response generation hold significant promise. Initially, I was skeptical of the concept of “reasoning” models, but I’ve come to realize that humans naturally think internally before articulating their thoughts. This realization highlights the importance of scaling inference-time reasoning to achieve better performance and more human-like decision-making.

  3. Can Robots Be Prompted Like LLMs?

    Integrating multiple modalities—such as vision, voice, and text—end-to-end into large foundation models is essential for unlocking a richer representation space and significantly improving model capabilities. While large language models (LLMs) encode vast knowledge, they are confined to the textual domain and cannot interact directly with the real world. To grant robots similar real-world capabilities, they must learn to interact with their physical(or simulated) environment, process sensory inputs, and build grounded world models. The ultimate vision is for robots to effectively predict and execute actions, bridging the gap between abstract language understanding and concrete real-world embodiment. By integrating these modalities, we can unlock the next generation of AI.