Discussion about this post

User's avatar
Jan Zilinsky's avatar

Fascinating results! Especially in light of the popular beliefs in computer science that LLMs are very brittle. (It sounds like you tested various prompts a results didn’t move a lot.)

I also wondered how these outcomes (high cosine similarity, diversity of cited sources) for these political factual questions would compare to 1) other factual questions (“what is photosynthesis?”) or 2) responses to some common conversations people have with chatbot (“how often should I exercise?” or something like that). That’d tell us whether developers are paying more attention to ensuring output stability when it comes to some topics, like politics.

No posts

Ready for more?