@Null I'd like to hear your take on the recent shake-up in the Copyright Office as well as the USCO's "shadow ruling" on AI fair use prior to any judges having made any rulings one way or the other. In the past you've talked about how important fair use is to the forum so I thought you might have some insight. Maybe Hardin could provide some insight too, if he's not too busy.
Warning: long sperg essay ahead. I feel like matters of copyright and fair use affect KF rather deeply and wanted to give this issue the attention it deserves.
The US Copyright Office has waded into the AI/fair use legal battle and ignited some controversy in the process. Their latest conclusions could have far-reaching impact on AI development, and more broadly how fair use is interpreted.
The USCO has been gradually developing a series of reports on the subject of AI and how they are planning to treat it moving forward. They've released three of them:
Part 1 - Where they stress the...
Essentially, the Copyright Office has declared AI training to not be fair use in a new "pre-publication"
report, which was pushed out only hours before the Librarian of Congress and the head of the Copyright Office were both fired.
In their report, the Copyright Office is so eager to rule in favor of AI plaintiffs that they accidentally(?) end up gutting some of the principles of fair use. What got my attention in particular is how they declare out of nowhere that AI training "uses" 100% of each image it trains on, so therefore it clearly fails fair use factor 3, citing statements made by the plaintiffs in ongoing litigation which no judge has yet said is a correct assessment. AI is a complex topic and can be hard to wrap your head around, but suffice to say, models aren't zip files and they don't store the works they train on. If "examining" something in full and producing a few bytes of derivative data counts as using 100% of it and therefore not fair use, then that would apply to every summary everyone writes of a movie, book, video game or pillstream. It would also apply to anyone using even a small part of a work, since they would've had to examine the entire work to find the part they wanted to use.
You could say well, you have to possess the work in full to be able to examine it and derive data from it, but the question of whether that possession is legal is a separate one from whether or not the work is 100% present in the finished product.
I know the USCO doesn't get to write laws, but their views could go on to influence judges and lawyers, or perhaps end up cited in a judge's decision. But I'm no lawyer, so I'm not sure how much of a concern this is.
Might even be something for the Internet Preservation Society to comment on.