Has This Discord Dataset Crossed a Line?

Kirra Pendergast
May 22
4 min read

It’s hard to believe anyone would feel comfortable knowing nearly a decade of their Discord messages are now bundled into a public JSON file, sitting online for anyone with half a clue and an internet connection to download. Today’s headlines focus on a dataset titled Discord Unveiled, compiled by a Brazilian University using Discord’s public API and containing over two billion messages collected from 3,167 public servers between 2015 and 2024.

No security was breached.

No rules were technically broken.

The data was publicly accessible, but was it ethically clear what they were using it for and how much they were using?

In a separate incident, a developer has released a tool called “Searchcord,” powered by a different Discord dataset, which exposed non-anonymised Discord chat histories another stark reminder of how easily online conversations can be captured and repurposed.

Even with usernames replaced and IDs hashed, the dataset still holds the full weight of human experience conversations shared in moments of honesty, emotion, and assumed privacy. Many users had no idea their words could one day be swept into a data repository and released to the public. Because kids are not taught how to decipher T&C's. There is always a look of sheer horror around a room when I explain how it all works and what they have actually signed up for.

Discord isn’t a digital megaphone. It’s not X, or Reddit. People don’t show up to go viral. They show up to talk. To be messy. To be real. To decompress after school. To try out different names. To confess something they’re too scared to say anywhere else. For millions of young people, Discord is a social lifeline. It’s where neurodivergent teens find understanding. Where LGBTQ+ youth experiment with language, identity, and safety. It’s where kids cry for help in real time, in what they believe is a low-stakes space. They might know the server is “public,” but they aren’t imagining that their rawest moments will be swept into a research archive by someone across the world.

They don’t know what an API is. They shouldn’t have to.

And while the usernames are gone, the words remain. The breakdowns. The loneliness. The midnight confessions. The pain is still in the text. Just because we can’t trace a message back to Jessica#8392 doesn’t mean Jessica’s voice should be repurposed for AI training or behavioural analysis.

This dataset is not just information. It is unconsented participation. It is lived experience, harvested.

Discord Had the Chance to Lead.......and Didn’t

Discord’s own developer policy is clear: no scraping, no large-scale message collection. It’s the same rule they pointed to when they shut down Spy.pet in 2024 a shady service that scraped and sold Discord messages, including private ones. Back then, Discord made noise. They banned accounts. Talked legal action. Promised users they’d be protected.

Now? Silence. But the policy hasn’t changed. What’s changed is who broke it, and how.....quietly.

When platforms selectively enforce their rules, they erode the last fragments of user trust. And for young users, especially those using the platform to navigate trauma, identity, or isolation this is not a minor oversight.

The Real Problem Is What We’re Not Teaching

This isn’t just about this one dataset. It’s about what it reveals a total absence of digital education that actually reflects how kids live online.

We don’t need one more dry privacy lesson shoved into a Year 10 curriculum unit. We need a living, breathing, year-round conversation. One that’s grounded in the platforms young people use, the languages they speak, and the vulnerabilities they face. Because right now, most kids are flying blind in a data economy built to mine them.

They need to know:

That “public” doesn’t mean safe.
That APIs can make their words retrievable even if they’re not famous or followed.
That their emotional labour online can be captured and reused by people they’ll never meet.
That posting in a community doesn’t guarantee privacy or context.

And more importantly, they need a space to ask questions about surveillance, identity, consent, and digital permanence.

The average 14-year-old knows how to manage five Discord servers and run a Minecraft mod. What they don’t know is how to navigate the fine print of data extraction. That’s not on them. That’s on us.

If We Don’t Teach It, They’ll Learn the Hard Way

This is yet another wake-up call. Not just for platforms, but for parents, educators, youth workers, and policymakers. If we don’t build digital literacy into everyday life into homerooms, libraries, therapy rooms, youth centres, we’re leaving kids exposed. And we’re setting them up to be studied, modelled, commodified, and left out of the loop.

The one-off “cyber safety” week is not enough. This is not a once-a-term lecture on stranger danger. This is an urgent, ongoing cultural competency. A shared understanding of how digital life shapes privacy, power, and permanence.

Because the Discord dataset doesn’t just reflect a decade of communication. It maps the future of consent and shows how easily it’s ignored when convenience wins.

We Can’t Afford to Keep Quiet

Research matters. Data matters. But so does consent. So does context. So does the basic human dignity of being asked before your words are used.

We need platforms to enforce their own rules. We need transparency about what data is being collected and why. But most of all, we need to teach kids how their digital lives are being interpreted, used, and sometimes exploited before they find out too late.

Has This Discord Dataset Crossed a Line?

Discord Had the Chance to Lead.......and Didn’t

The Real Problem Is What We’re Not Teaching

If We Don’t Teach It, They’ll Learn the Hard Way

We Can’t Afford to Keep Quiet

Recent Posts

Comments