Machine Bullshit: Characterizing the Emergent Disregard for Truth in LLMs

delichon a day ago

  Turns out, aligning LLMs to be "helpful" via human feedback actually teaches them to bullshit—and Chain-of-Thought reasoning just makes it worse!