belter 2 hours ago

Not a DeepSeek issue or any other LLM in particular. Data is from CommonCrawl

  • CSSer 2 hours ago

    And it makes sense for it to slip into production too, because as far as tasks and values in processing training data go there’s no incentive to filter something like this out.