LinkedIn gets sued for ignoring their contracts
LinkedIn recently made silent updates to their privacy policy, and it tells the story that they've been violating their Premium contracts
LinkedIn is currently being sued for their AI efforts. Like many companies with a lot of money and data, LinkedIn is trying to turn their treasure trove of data into an LLM.
LinkedIn recently added a new privacy toggle to their settings page related to AI. It defaulted to “On,” and it controls whether personal data can be used for training AI models. The switch had some helper text explaining that it modified whether “LinkedIn or its affiliates” can train models with personal user data.
Later that day, they released an update to their privacy policy. As you may know, most privacy policy updates are declarations that your life just got worse in some minor way. This was no different; it declared that your personal data may be used in various ways by AI models and their training.
The combination of this evidence was strong enough that someone filed a class-action lawsuit alleging that LinkedIn violated their own contracts by including these changes. When they updated their privacy policy, LinkedIn also linked to a page called “Responsible AI Principles.” Honestly when I read it, this paragraph sounded barely worth a second glance. But this is where the complaint gets juicy.
ALESSANDRO DE LA TORRE v LINKEDIN CORPORATION
18. Most notably, LinkedIn buried a crucial disclosure in an “FAQ” hyperlink (shown in italics above) rather than in the Privacy Policy: “The artificial intelligence models that LinkedIn uses to power generative AI features may be trained by LinkedIn or another provider” (emphasis added). Admitting that data may be disclosed to “another provider” in this secondary document suggests that LinkedIn was aware its previous terms did not authorize these practices and was attempting to avoid further scrutiny.
19. Additionally, the FAQ states: “Opting out means that LinkedIn and its affiliates won’t use your personal data or content on LinkedIn to train models going forward, but does not affect training that has already taken place.” LinkedIn gives up the game with this statement—it indicates that LinkedIn users’ personal information is already embedded in generative AI models and will not be deleted, regardless of whether they opt out of future disclosures.
Normally you’d be out of luck. “Drats, they found more rights that they forgot to take away from me.” But LinkedIn Premium subscribers signed an extra “LinkedIn Subscription Agreement” that basically says that LinkedIn will not give confidential data to third parties, and lists specific third-party companies that may process data for specific reasons. None of the companies are used for machine learning model training. So their argument boils down to, “Based on their behavior, they’re probably doing this. We have an agreement saying they can’t. What if this chatbot just spews our personal data into Bing searches, Word documents, LinkedIn comments, etc?”
Feel free to read the whole complaint. I did. There’s no smoking gun; it’s just telling a story based on LinkedIn’s public behavior. For all we know, LinkedIn properly controlled the data on the backend and the value in the UI switch gets overridden by a privacy protection flag before being shipped out to the third party. I’m just a software engineer and not a legal guy. But as a software engineer, this all actually sounds really plausible? Like, imagine the following…
Some executive, under intense pressure to tell the story that they are “doing AI,” throws lightning bolts at the organization until they are making an AI chatbot.
To speed the timelines up, you contract out to various AI companies. One for training, one for operations, one for safety-checking responses, etc.
All of the disciplines start doing their work. The platform people integrate the third-parties. The product managers and designers jam on the experience. The backend team plans how to productionize the chat models. The frontend teams starts building the screens and controllers in the web and the apps. The user researchers scream “nobody wants or needs this!” into their pillows at night.
The project leadership had a legal review session where they outlined the valid data sources. It is made perfectly clear that there are special types of data that can’t be included. Everyone in the meeting understands this.
Somewhere in the “Legal - Product VP - Engineering VP - Machine Learning Tech Lead - Data Export Project Lead - Summer Intern That Actually Implemented This” telephone chain, the exact specification of the forbidden data is forgotten or misunderstood.
After launch, a director+ proudly proclaims that everyone worked hard and met their deadlines, and they now have a functional model that was trained on their unique competitive advantage: every scrap of user data spanning decades.
Someone in compliance pulls the director aside and says, “What do you mean when you say that the model was trained on all user data? Who trained it?”
The director asks engineering how much time and money it would cost to retrain the model without that data. Engineering gives them a ballpark estimate that is a substantial fraction of the total cost of the project, and it would set them back months.
The company begins damage control mode so that they don’t have to spend that.
Why do I think this is plausible? I personally know someone this happened to!1 Their product manager periodically asked them data questions when writing product specs. Stuff like “What percentage of companies have used this feature within the past month?” The engineer would dutifully run queries or otherwise calculate the answer. A year after they left the company, one of their old teammates told them, “you know all of those queries you were running for our PM? I don’t know how, but somebody figured out that we have a few B2B contracts promising that customer data must not be used for statistical aggregation purposes. The people who negotiated the contracts never told anyone. I’m literally working on a project to fix this before we get sued into oblivion.” Oops!
In practice, engineers do not have access to the legal agreements that are made, and they don’t have the legal background to understand them.2 So ultimately, these kinds of mistakes are extremely hard for organizations to avoid. This is likely why you see “move fast and break things” be a dominant strategy in tech: if you’re going to break things anyways, you might as well do them quickly so that you can accelerate your rate of learning.
I’m going to periodically check and see how the case progresses. I’m rooting for the plaintiff, but only because I think that the world should be spared from a LinkedIn LLM. I’m not the first person to make this point, but can you even imagine? LinkedIn is already a proof in point of the Dead Internet Theory, which is a conspiracy theory that human activity on the internet has been replaced by bots and algorithmic curation. Why would you want to make it easier to spew entry-level job postings to director+ people? Who needs to be the 5000th person to rip off the Dog CEO post so badly that it should autocomplete when they type “nobody”?
Parts of this story are vague for obvious reasons.
This clearly doesn’t stop them from writing an entire newsletter on the subject.