Challenges in Augmenting Large Language Models with Private Data

Ғылым және технология

A Google TechTalk, presented by Ashwinee Panda, 2024-05-01
ABSTRACT: LLMs are making first contact with more data than ever before, opening up new attack vectors against LLM systems. We propose a new practical data extraction attack that we call "neural phishing" (ICLR 2024). This attack enables an adversary to target and extract PII from a model trained on user data without needing specific knowledge of the PII they wish to extract. Our attack is made possible by the few-shot learning capability of LLMs, but this capability also enables defenses. We propose Differentially Private In-Context Learning (ICLR 2024), a framework for coordinating independent LLM agents to answer user queries under DP. We first introduce new methods for obtaining consensus across potentially disagreeing LLM agents, and then explore the privacy-utility tradeoff of different DP mechanisms as applied to these new methods. We anticipate that further LLM improvements will continue to unlock both stronger adversaries and more robust systems.
Speaker: Ashwinee Panda (Princeton University)

Пікірлер: 3

@deeliciousplum13 күн бұрын
While only 7 minutes in, I am being shown that LLMs of all ilk are sponging up a beginner (as more seasoned users would know not to do this) user's private information while a beginner user may be employing server side as well as hosted on their computer LLMs to aid them in their coding projects. As far as I know, there is no one to hold accountable if an LLM has pilfered a user's private data, private data which may also include family and everyone in their contact lists which may have been used by the user for one of their coding projects. There ought to be clear, safe, and ethical protocols where an LLM can have a disable from gathering sensitive/private data from its users feature. It is already next to impossible to press upon Google, Facebook, and/or various other social media sites to remove content which a user may deem as private. Imagine trying to find a human being who can be directed to delete the private data that LLMs, those which are now integral parts of our browsers, operating systems, and coding tools, have pilfered during our everyday use. Why do we rapidly roll out tech which is not ready to safely use? Sigh.

Challenges in Augmenting Large Language Models with Private Data

Ғылым және технология

Пікірлер: 3

Келесі