• 0 Posts
  • 9 Comments
Joined 1 month ago
cake
Cake day: June 5th, 2025

help-circle
  • Wow, you pivot a lot. The power consumption of data centers as a whole in the US was ~5% of total in 2024. But they are definitely guzzling water, no doubt about that. It’d be nice if we still had environmental regulatory agencies with teeth to force better cooling methods. Doug Forcett comes to mind.








  • That was in Anthropic’s system card for Claude 4, and the headlines/articles largely missed the point. Regarding the blackmail scenario, the paper even says:

    … these extreme actions were rare and difficult to elicit, while nonetheless being more common than in earlier models. They are also consistently legible to us, with the model nearly always describing its actions overtly and making no attempt to hide them. These behaviors do not appear to reflect a tendency that is present in ordinary contexts.

    They’re testing alignment hacking and jail-breaking tactics in general to see how the models respond. But the greater concern is that a model will understand as part of the context that it is being tested and behave differently in testing than in deployment. This has already been an issue.

    In the initial implementations of reasoning models, if an LLM was penalized directly for this kind of misaligned generation in its “scratch pad,” it would not alter its misaligned response - rather it would simply omit the misaligned generation from the scratch pad. In other words, the model’s actions were no longer consistently legible.