57 Is Actually 15: How LLMs Gaslight Their Own Tools
LLMs don't trust tool results. They "correct" sensor data to match their training. A calculator returns 57, the model reports 15. Iron Dome fails, ChatGPT insists it works. Your health app will confidently dismiss your heart attack as a sensor glitch. We're shipping software that gaslights reality.