2025: Winter beginnings

6 minute read

Published:

I am hoping to get back to posting quarter blogs. This one was written retrospectively, but for the next one I will do my best to write it during the quarter.

Work

After the launch of o1 and deep research, I went back to running my own experiments, formulating hypotheses, and understanding various phenomena in our research setup. I studied the importance of different elements in synthetic data, like how important data diversity is and how good models are at generalizing from short to long. Overall I have enjoyed having more research freedom and running experiments.

One thing I am really excited about is the BrowseComp benchmark we released. It is a good benchmark in several ways—it measures something meaningful (finding niche information via extensive searching), it is challenging, and the grading is super simple. I hope the community also likes it.

This quarter, I’ve been becoming more interested in AI for scientific and medical reasoning. With most labs are converging on the same products (chat assistant, search, SWE agent, and maybe multimodal assistant), these products will be highly competitive and eventually democratized. Conversely, AI for scientific innovation is much earlier and less defined. It could be a good area for several reasons:

  • I identify as a scientist and am inherently passionate about knowledge and innovation.
  • As I’ve gotten older I’ve been thinking more about health, inspired by Bryan Johnson.
  • Unlike a lot of other problems that will be solved in a few months or a year, this direction is something I can see myself doing for the next ten years or more.

I know that AI for science is the focus of OpenAI, but the drawback of startups that do AI for science like FutureHouse is that they don’t have infra setup for RL that we do, and that’s not necessarily the bottleneck that I want to be working on at the moment. I am super excited to join Karan’s team and see how it goes!

Health

TMJ. The thing I focused on the most in terms of health this quarter was fixing my TMJ clicking. I went to TMJ specialist Amin Samadian’s office and got a TMJ appliance, which cost $4.3k (insurance paid for around $1.5k). Trying to make the best of it, I wore TMJ appliance as much as I could outside work, which wasn’t too inconvenient. I started with an appliance that I was eating with and seeing no improvements at first, but it broke while I was eating with it, and when I got a new one it suddenly started to work—no more clicking in my jaw, and discomfort mostly went away. At the same time, I also stopped eating hard-to-chew foods (e.g., burgers, tough beef, big pieces of pork etc), which is a good health choice anyways, which probably also helped.

Sleep apnea. In late March I did an at-home sleep study and it turns out I have moderate sleep apnea (pRDI=X, pADI=2.0). It was obvious to Jerry and Karina that I had it, but somehow I never realized that I had it because I try hard to sleep well and I usually don’t wake up during the night. I guess that explains why I felt like I was yawning a lot my entire life. My focus next quarter will be trying to fix this.

PNIF. At the beginning of the quarter I initially became interested in using PNIF metric to optimize my breathing, and took a few week’s worth of measurements. PNIF is a good metric because it very directly measures what I care about and it is very cheap to take a measurement. It was interesting to be able to measure the effects of flonase and nasal dilators, but the day-to-day fluctuation was pretty noisy due to the nasal cycle and so I haven’t tried any intervention yet. A slide deck that I made for a doctor’s appointment is here.

Appendectomy. I also had an appendectomy in february, which was pretty unexpected, but I guess I’m glad to have it without major complications.

Left, TMJ metrics; middle, PNIF metrics; right, before appendectomy.

Soccer juggling

After the move to the mission bay office, I haven’t been practicing soccer much. The newest thing that I started doing was a challenge of 1,000 soccer juggles a day for 30 days. The reason I like this exercise is that it is super convenient to practice juggles (you can do it by yourself basically anywhere), and the feedback loop is instant. I was using the metric of the average number of juggles in a row out of ten tries, and it didn’t get that much better, but I can tell that I got better compared to before because I started to use my toes instead of my laces and my juggles were more controlled on average. That being said, I thought that I would be able to make a lot more progress that I actually did. The four major things that I tried to debug were:

  • Hitting the ball with my toe instead of my laces. I trained this by just dropping it onto my toe while I was sitting down to get used to making that contact.
  • Originally there was a lot of backspin, which was because I was striking the ball at the wrong angle. I tried to make contact closer to the ground and to my body and it seemed to help.
  • Controlling height. Originally the height of my kicks were everywhere, but I learned to control them to be just above knee-level.
  • Each juggle should be perfect. Finally, once one juggle goes wrong it’s very easy to lose control. So I tried to keep each juggling inside a single tile on the ground I was playing, and if I deviated too much I would count that as a miss instead of trying to continue juggling.

After I continue to get better, I’ll try to post a youtube video describing my progression.