- Joined
- Jul 18, 2019
ChatGPT can train itself on prompts. They say they don't but the usage is fed into a giant anonymized dataset and I'm like 90% certain they're lying about how much they use your data (just like Meta did). If you feed it a bunch of data, it can potentially leak snippets of it. Asking for the document is probably disallowed by the guardrails but now GPT will know things it shouldn't.Can someone explain how that document is now considering public domain because he uploaded it to Chat GPT? Is it possible to actually retrieve it?
There have been reported instances of programmers using LLMs with private codebases with hardcoded credentials and then the LLM later vomiting up those credentials to random users.
This is one of the problems of LLMs - they're giant opaque piles of linear algebra and statistics. You can't easily remove data from them once it's been trained in because the neural connections representing that data is non-obvious and spread out throughout the model.