

Kids will do things if they see other children doing it in pictures and videos. It’s easier to normalize sexual behavior with cp then without.
Kids will do things if they see other children doing it in pictures and videos. It’s easier to normalize sexual behavior with cp then without.
Although that’s true, such material can easily be used to groom children which is where I think the real danger lies.
I really wish they had excluded children in the datasets.
You can’t really put a stop to it anymore but I don’t think it should be something that’s normalized and accepted just because there isn’t a direct victim anymore. We are also talking about distribution here and not something being done in private at home.
We first use the DE-COP membership inference attack (Duarte et al. 2024) to determine whether a particular data sample was part of a target model’s training set. This works by quizzing an LLM with a multiple choice test to see whether it can identify original human-authored O’Reilly book paragraphs from machine-generated paraphrased alternatives that we present it with. If the model frequently correctly identifies the actual (human-generated) booktext (for books published during the model’s training period) then this likely indicates priormodel recognition (training) of that text.
I’m almost certain OpenAI trained on copyrighted content but this proves nothing other then it’s ability to distinguish between human and machine written text.
It isn’t. I’d even say that simply completing puzzles is far from AGI, even if the puzzles are complex.
I’m mostly talking about being able to train on copyrighted content. This is on me though, I got mixed up. That’s what I meant in my first comment.
If you think someone can train a model on legally obtained data (Google images, YouTube, internet archive), then that is fair.
Personally, I think using pirated or at least bought content that is ripped (Netflix, DVDs) should be exempt (for everyone obviously, not just OpenAI.) Some data is already behind huge mega corps like record labels, Hollywood, publishing houses, etc. OpenAI can afford the cost but the little guys will be screwed when it comes to SOTA.
It’s also worth noting that most current lawsuits are aimed at how the data is used and not how it’s sourced if I’m not mistaken. The laws coming from these lawsuits won’t be used to bolster anti-piracy laws but copyright laws instead, targeting fair use and transformative clauses imo.
It’s sadly already happening in regards to stack.
Mostly youtube, reddit and image search. I guess I could just record a Netflix stream if I needed the whole movie. I guess recording a Netflix stream is pirating? Probably easier with a torrent.
What does it matters? I don’t think pirating is unethical especially when it’s not even redistribution but transformative. Openai has never stopped me from pirating or even asked me to stop. Not sure what you mean with “no one else”.
You ever ask yourself if the memes made from movie scenes used pirated media?
What pirate bay is doing isn’t exactly transformative. I pirate most of my media and can’t say I’m not for better copyright laws and a better treatment of pirate bay, I just think the situations are different.
I don’t think saying “if pirate bay is illegal, so should training ai without compensations” is exactly fair. (I wish the actual people contributing could be compensated, but how it’s set up, we would be giving a few companies a monopoly while compensating mostly data aggregators.)
Reforms don’t have to be pro-corporate slop.
Sadly, the media and most of the population is practically begging for it. When you couple that with the pressure exerted by record companies, publishing houses, etc, it is clear those are the reforms we get if any.
In our current society, little people can get away with it. I can take whatever style I want and train a model on it. There’s already many ghibli ressources in the open source scene, and a lot of them date from 2 years ago.
This whole situation is rage bait to manipulate the population into cheering for new copyright laws so politicians get little push back when they start writing pro-corporate laws regarding AI.
I understand the sentiment but I think it’s foolhardy.
And all that mostly benefiting the data holders and big ai companies. Most image data is on platforms like Getty, Deviant Art, Instagram, etc. It’s even worse for music and lit, where three record labels and five publishers own most of it.
If we don’t get a proper music model before the lawsuits pass, we will never be able to generate music without being told what is or isn’t okay to write about.
I think it will be punished, but not how we hope. The laws will end up rewarding the big data holders (Getty, record labels, publishers) while locking out open source tools. The paywalls will stay and grow. It’ll just formalize a monopoly.
It shouldn’t be much of a problem using a gibli based model with img2img. I personally use forge as my main ui, models can be found on civitai.com . It’s easily possible, you just need a bit of vram and setting it up is more work. You might get more mileage by using controlnet in conjunction with img2img.
Banning the tech, banning generated cp on the internet or banning it at home?
I’m a big advocate of AI and don’t personally want any kind of banning or censorship of the tools.
I don’t think it should be published on any kind of image sharing sites. I don’t hold people publishing it in high regard and I’m not against some kind of consequence. I generally view prison as unproductive though.
At home, I’m not sure. People imo can do what they want behind closed doors. I don’t want any kind of surveillance but I don’t know how I would react if it got brought up at a trial, as a kind of proof if the allegations have something to do with that theme (child molestation).
I also don’t think we need much of a reason to ban it on the web.