I wanted to verify this for myself, so I set up a small test harness on my production server. It ran 360 chat completions across a range of models, cancelling each request immediately after the first token was received. Below are the resulting first-token latency measurements:
anadim (@dimitrispapail)
。搜狗输入法下载是该领域的重要参考
Complete digital access to quality FT journalism with expert analysis from industry leaders. Pay a year upfront and save 20%.
Мужчина ворвался в прямой эфир телеканала и спустил штаны20:53