Much of AI is not about generating new knowledge, but about distributing existing knowledge in new form. That creates copyright conflicts. And it threatens how humans will collaborate in the future.
As a researcher in generative AI and an ethicist working at the intersection of cognitive technology, copyright, and public trust, I found this piece both poignant and prescient.
Your concern over the erosion of intellectual integrity under the guise of "artificial intelligence" is shared by many of us who see how opacity, scale, and a lack of enforceable provenance frameworks are eroding not only copyright norms but the philosophical ground upon which creative authorship stands.
The core issue you raise, the untraceable intermixing of genuine intellectual labour and machine-hallucinated content, is exacerbated by a legal landscape not yet equipped to parse intention, origin, or fair use at the computational scale models like Midjourney and Stable Diffusion now operate.
The lawsuits involving these platforms, particularly in the Southern California DMCA cases, underscore the risk of precedent law establishing liability chains so deep and diffuse that downstream users: collectors, NFT buyers, even curators, may suffer catastrophic loss.
If a takedown is issued on platforms like OpenSea or MagicEden due to provenance violation, the assets (often purchased in good faith) risk becoming permanently stranded, value locked, art orphaned, and trust in decentralized systems destroyed.
In the short term, we may need to adopt verifiable data chain registries for training sets and output tracking
Essentially ethical audit trails. Solutions like C2PA (Coalition for Content Provenance and Authenticity) show promise when expanded beyond photojournalism. Mid-term, the solution must be infrastructural: embed watermarking, disclosure protocols, and regulatory-compliant opt-in datasets at the model architecture level.
This is not just a technical fix but a moral imperative to avoid continued digital enclosure of human creativity under ambiguous licensing pretenses.
Your call for discernment is timely. Without it, we risk not only losing the rights of the original creator but also confusing the essence of creation itself.
Great topic and article! I do know that companies like Reddit restricts the use of its data for training new AIs through its Developer Terms of Service, prohibiting developers from using Reddit data to train LLMs without explicit permission.
Now I wonder if Substacks can easily be eaaily be taken?
Suchir pointed to a study that showed Stack Overflow usage was down since ChatGPT. I wonder if X as a similar problem. Perhaps the bot problem is relates to this? Need to fake activity because users chat with AI instead of with each other?
Great summary of the copyright issues! I don't see how these AI companies escape this issue if the courts rule fairly on the law.
Yes, there are potentially some similarities with humans in respect to knowledge acquisition, but there are also important differences. We rely on memorization far less. We learn processes, styles, techniques etc. Almost no human artist wants to perfectly mimic another artist. They want to be known for their own style.
Sometimes an artist takes inspiration from another artist, but this is also very different in that the artist typically acknowledges this fact, it is still unique as they add their own style, and the original artist often sees this as a compliment because they aren't stealing their work. They have put a lot of work into learning the techniques. It can't be mass produced and isn't a threat to the original.
One of the other ways I've proposed to distinguish human created work from AI machines, is that machines generate no new semantic information. It is another supporting argument to your statement that "AI’s utility today is not about generating new knowledge, but instead feeding people with existing knowledge" FYI.
"Patterns still need content to be applied. In order to generate output there must be actual stuff on which the patterns can operate. So, where does this stuff come from? There are only two input sources: the prompt and the training data. There are no other options for a machine with no intelligence or reasoning, it is incapable of creating new information. Therefore, all of the information in the output is either from the prompt or the training data."
Dear SIr,
As a researcher in generative AI and an ethicist working at the intersection of cognitive technology, copyright, and public trust, I found this piece both poignant and prescient.
Your concern over the erosion of intellectual integrity under the guise of "artificial intelligence" is shared by many of us who see how opacity, scale, and a lack of enforceable provenance frameworks are eroding not only copyright norms but the philosophical ground upon which creative authorship stands.
The core issue you raise, the untraceable intermixing of genuine intellectual labour and machine-hallucinated content, is exacerbated by a legal landscape not yet equipped to parse intention, origin, or fair use at the computational scale models like Midjourney and Stable Diffusion now operate.
The lawsuits involving these platforms, particularly in the Southern California DMCA cases, underscore the risk of precedent law establishing liability chains so deep and diffuse that downstream users: collectors, NFT buyers, even curators, may suffer catastrophic loss.
If a takedown is issued on platforms like OpenSea or MagicEden due to provenance violation, the assets (often purchased in good faith) risk becoming permanently stranded, value locked, art orphaned, and trust in decentralized systems destroyed.
In the short term, we may need to adopt verifiable data chain registries for training sets and output tracking
Essentially ethical audit trails. Solutions like C2PA (Coalition for Content Provenance and Authenticity) show promise when expanded beyond photojournalism. Mid-term, the solution must be infrastructural: embed watermarking, disclosure protocols, and regulatory-compliant opt-in datasets at the model architecture level.
This is not just a technical fix but a moral imperative to avoid continued digital enclosure of human creativity under ambiguous licensing pretenses.
Your call for discernment is timely. Without it, we risk not only losing the rights of the original creator but also confusing the essence of creation itself.
Warm regards,
Graham dePenros
COO, Open Code Mission
GenAI Ethics and AI/Neuro Ethics SME
Thank you, Graham!
Great topic and article! I do know that companies like Reddit restricts the use of its data for training new AIs through its Developer Terms of Service, prohibiting developers from using Reddit data to train LLMs without explicit permission.
Now I wonder if Substacks can easily be eaaily be taken?
Suchir pointed to a study that showed Stack Overflow usage was down since ChatGPT. I wonder if X as a similar problem. Perhaps the bot problem is relates to this? Need to fake activity because users chat with AI instead of with each other?
Yeah the problem on X seens to related. The bot thing just kills any user experience on the platform
Great summary of the copyright issues! I don't see how these AI companies escape this issue if the courts rule fairly on the law.
Yes, there are potentially some similarities with humans in respect to knowledge acquisition, but there are also important differences. We rely on memorization far less. We learn processes, styles, techniques etc. Almost no human artist wants to perfectly mimic another artist. They want to be known for their own style.
Sometimes an artist takes inspiration from another artist, but this is also very different in that the artist typically acknowledges this fact, it is still unique as they add their own style, and the original artist often sees this as a compliment because they aren't stealing their work. They have put a lot of work into learning the techniques. It can't be mass produced and isn't a threat to the original.
One of the other ways I've proposed to distinguish human created work from AI machines, is that machines generate no new semantic information. It is another supporting argument to your statement that "AI’s utility today is not about generating new knowledge, but instead feeding people with existing knowledge" FYI.
"Patterns still need content to be applied. In order to generate output there must be actual stuff on which the patterns can operate. So, where does this stuff come from? There are only two input sources: the prompt and the training data. There are no other options for a machine with no intelligence or reasoning, it is incapable of creating new information. Therefore, all of the information in the output is either from the prompt or the training data."
https://www.mindprison.cc/p/ai-creativity-types-explained-permutations-semantic-information
Exactly and copyright law was written with the human plagiarist in my mind. It gets a whole other dimension when it's automated.