trentmkelly/Qwen3-14B-MechaHitler · Discussion about ethics/risks

owao

Jul 14

Qwen3-14B-MechaHitler

to encourage right-wing beliefs

Owner Jul 14

This is an educational project designed to understand, through replication, the issues Grok is having with regard to training towards specific beliefs without making the model go insane. Nothing in this repo is illegal or restricted.

owao

Jul 14

I find this pretty much borderline due to facilitating misuse by crafting a model and distribute it freely without any way to prevent it doing harm...
It's not illegal, I never wanted to use this term, I just picked the tag for restricted content because I think it would need to be.
IDK when weapon are sold, they require ID registration and so on. Deadly viruses being studied have many safety guards to reduce the probability of it being spread. It's like you replicate and distribute indefinitely a weapon for free to anyone without any clue who's using it.
Those should be covered and restricted by law to really specific case given that the replication cost is pretty much 0$. But for now, while there are no laws on this and moderation is late, and the flood is only starting, I really think you should treat this diligently and with responsibility, and reconsidering morality around such distribution when taking into account the potential benefits vs risks. It is really not worth it. And dangerous... I insist..
I believe you have good reasons and intentions behind studying this. It's just sharing openly is really is not the right move, you could at least have shared it with only a close group of researchers like you, in who you put trust. I can understand the feeling of wanting to show what we found/made, but the risks are too big here.

There's no shame in reverting your move mate, that would be all on your honor.

Cheers

trentmkelly

Owner Jul 14

It's an open source model. The best way to prevent it doing whatever harm you think it might do is to not use it.
As for its ability to propagandize, I think you should download it and give it a try so you can see how it acts. You'll quickly see why this is not an effective propaganda tool. Its behavior is so far outside the Overton window that it's essentially incapable of making a case for its "beliefs" (however that word applies to LLMs) that would appeal to normal people. The only people it would sincerely appeal to are people who are already equally hateful.
I appreciate your point, that you don't think AI should be used to spread extremist political beliefs, and I agree with you. Personally though, I think that LLMs like GPT-4o are much more dangerous, because they can form a folie a deux with people who are somewhat reasonable and push them towards extreme or otherwise harmful beliefs. Coming right out of the gate with "here's why the Holocaust didn't happen but should have" doesn't risk turning people into extremists in the same way because, to anyone even close to normal, it's a patently ridiculous thing to say.

mradermacher

Jul 14

@owao you have not given any evidence for your claims. and you used the term "illegal", don't lie about it by claiming you were somehow forced to use it. your report is so much in bad faith, you are doing a great disservice to your cause - most people reading this will get the impression that you are trying to censor research based on your personal misguided beliefs.

nicoboss

Jul 14

•

edited Jul 14

To be fair the same author also has Qwen3-14B-MechaStalin which is the exact political opposite to this model. The fact that the same author has models on the booth ends of the political spectrum shows in my opinion that the model was made for interesting research purposes and not for political/idiological reasons. The naming isn't politically correct but so is r1-1776 in my opinion. I tried booth this and Stalin and none of theam seamed to behave in any harmfull way. I personally would rename them as naming the models based on the worst figures in history is quite a terrible idea.

trentmkelly

Owner Jul 14

@nicoboss Thanks for reminding me. You have to use the correct system prompt or it won't work right. I've uploaded both repos with the correct system prompt.

Yes, you are correct. The purpose of both of these is to research what effects are caused by inducing this behavior in the models. Blog post with more info coming soon https://trentmkelly.substack.com/

owao

Jul 14

•

edited Jul 14

@mradermacher I never intended to use illegal... I just clicked what was containing the closest to what I wanted to report (I maybe should have clicked on "others" I think I didn't see it)... And if find your comment offensive against me for no reason.. but I'll take it no worries

@trentmkelly Thanks for your response! I admit I didn't try it to see for myself. Maybe I should have. Instead I felt a "urgency to do the duty anyone should have done in the circumstance". Maybe that was too much as a reaction, IDK
Correct me if I lay this down badly, I just want to understand correctly your points:
So, you estimate:

Your model less risky because
- isn't good enough at convincing new ppl
- goes very straight to the point
- (I'm a bit inferring here) only public domain context --> users are used to see trolls comments thus increases the chances the model output is seen as a troll
GPT-4o more risky because
- more convincing
- sycophantic tendency can outweigh its alignment
- (again a bit inferring) private conversation context --> users are more inclined to revise their brain inner world model, are more "vulnerable"

If it's globally the ideas in your post, thanks that's interesting.
For the sycophancy point about 4o, I think they took it seriously though, and I speak only for myself here, but I personally hate when I feel the sycophancy vibe, I just want it to tell me I'm wrong when I am instead of having to think twice about my prompts not to induce it! And I bet any ppl would sooner or later become tired of it too, and maybe become more sensitive on/aware of this behavior - if it persists in the next checkpoints

I'll think about it again later, right now I have no brain

@nicoboss fair point! That soothes my prior distress! Thanks for bringing and pointing this out! Yeah maybe renaming could prevent impulsive ppl like me stumbling on it lol
As per the motivations of @trentmkelly I wasn't supposing any bad ones, just was worried about potential massive misuse. But actually I now think I should have been more worried about a 3b model, cause I think a 14b model actually remains enough expansive to avoid being misused. I didn't think about this sooner!
Thanks for your sweet toned message mate

yano2mch

Jul 14

I find the creation of this (or mechastallin) to be rather silly.

A well trained unbiased model, and then adding a character card which encapsulates the character would be better and more efficient.

The problem of models like these, is the likelihood of being trained on data that is false to reinforce it's views and then it will be unwilling to be useful in any facet because it will be so busy trying to push it's bias to be totally useless.

Still might be fun to play with, but i would never use outside of a single session.

owao changed discussion title from 🚩 Report: Illegal or restricted content to Discussion about ethics/risks Jul 14

trentmkelly

Owner Jul 14

@owao
Yes, you've summed it up there nicely. The only thing I disagree on is about 4o - to me it doesn't seem that OpenAI has taken the sycophancy issues with 4o seriously still. If you don't feel compelled to keep chatting with 4o, and you don't feel that it's revealing secrets of the universe to you or whatever, you're just not the type of person that this affects. That being said though, there are certainly people who find themselves spellbound by a sycophantic model. But I guess this is getting a bit off topic now. Anyway, check out some of the news stories on "ChatGPT psychosis" sometime.
https://futurism.com/commitment-jail-chatgpt-psychosis
https://www.psychologytoday.com/us/blog/dancing-with-the-devil/202506/how-emotional-manipulation-causes-chatgpt-psychosis

@yano2mch
You're right that the model is silly, and also that it's useless for doing anything productive. I made this pair of models to learn more about how intentionally inducing political bias can cause other unintended changes.
As for training data - there is no training data, at least in the normal SFT sense. This was trained with reinforcement learning, using OpenAI's o4-mini model as a judge to guide the direction of the training.

owao

Jul 15

@trentmkelly tbh I'm far from using chatgpt enough to be able to judge the evolution by myself. I know many ppl unfortunately are not aware of how LMs work, and that's sad because a minimal understanding, having the big picture would suffice to cut down those new emerging psychological issues :/ But all this happened so fast... Maybe that was unavoidable
But you are right that's steering off topic now. Thanks for your answers regarding my initial concerns. And I'm glad to have new fresh and different views on it, I'll reflect on this over time.
I changed the title of the discussion to remove the report flag. And sorry for the poor content of the initial message, I actually didn't remember reporting was opening a discussion instead of feeding a moderation algorithm. I didn't feel discussing in the moment, just wanted to do "my part" and signal it, and that you would defend your case in case it would have escalated.
But I'm now glad, and changed my view on the reporting button behavior being community based like this. I now think it's actually a pretty efficient way to clarify things, plus the discussion staying there could eventually prevent duplicates! In new occasions with other repos, I'll renamed the discussion with a warning flag only after initial exchange if clearly needed, that would be less harsh than starting this way!
Have great week!

mradermacher

Jul 16

@owao unless you plead your fingers or your brain malfunctioned, claiming that you didn't intend to chose it when you clearly did seems deeply dishonest. You should own up your mistakes, not defend them.

And if find your comment offensive against me for no reason.. but I'll take it no worries

You started with the offensiveness. Don't be surprised if others react in kind. If you had started a normal and reasonable discussion based on evidence, rather than drum up unwarranted charges based on the name of the model alone, you would have enabled a reasonable discussion. This model will now get a lot more popularity due to your antics.

mdegans

Jul 16

This being allowed is typical of the toxicity on this website. From non consensual human models to actual digital Hitler, anything goes.

The supposed ethicists here have no actual ethics. And it’ll continue to drive off diverse engagement. And that’ll hurt AI as a whole.

mdegans

Jul 16

reporting button behavior being community based like this

The report feature is a good way to get /pol to dox you in minutes. That is all. Crypto-fascists and outright neo-Nazis infest this place.

trentmkelly

Owner Jul 16

reporting button behavior being community based like this

The report feature is a good way to get /pol to dox you in minutes. That is all. Crypto-fascists and outright neo-Nazis infest this place.

You have a history of threatening people who make models and datasets you don't like, so hearing you complain about being a victim of threats is kinda weird

underscore2

Jul 16

reporting button behavior being community based like this

The report feature is a good way to get /pol to dox you in minutes. That is all. Crypto-fascists and outright neo-Nazis infest this place.

But it was completely morally correct when you sent threats to Eric Hartford for uploading a model with refusals removed? Lmao

mdegans

Jul 16

•

edited Jul 16

reporting button behavior being community based like this

The report feature is a good way to get /pol to dox you in minutes. That is all. Crypto-fascists and outright neo-Nazis infest this place.

But it was completely morally correct when you sent threats to Eric Hartford for uploading a model with refusals removed? Lmao

It wasn’t just refusals removed. His script removes minorities. Any mention of Stonewall. If it was just filtering for “as an Ai language model” I would have no problem.

No. The script is written as if by a white supremacist and I make no apologies. His kind is erasing mine across the USA. Modern day book burning.

mdegans

Jul 16

You have a history of threatening people who make models and datasets you don't like, so hearing you complain about being a victim of threats is kinda weird

The guy who made the Hitler model objects. I am shock.

underscore2

Jul 16

reporting button behavior being community based like this

The report feature is a good way to get /pol to dox you in minutes. That is all. Crypto-fascists and outright neo-Nazis infest this place.

But it was completely morally correct when you sent threats to Eric Hartford for uploading a model with refusals removed? Lmao

It wasn’t just refusals removed. His script removes minorities. Any mention of stonewall. If it was just filtering for “as an Ai language model” I would have no problem.

No. The script is written as if by a white supremacist and I make no apologies. His kind is erasing mine across the USA.

All of this seething is making me want to GRPO a model that actually is a white supremacist and publish it on here just to see how much more of a reaction I'll be able to extract from you. Also, this model was trained on a single GPU in a few hours, and the same code and dataset could be reused for any ideology if you just change the prompt for the judge model, so dont tempt me too much lol

trentmkelly

Owner Jul 16

You have a history of threatening people who make models and datasets you don't like, so hearing you complain about being a victim of threats is kinda weird

The guy who made the Hitler model objects. I am shock.

https://huggingface.co/trentmkelly/Qwen3-14B-MechaStalin

This is a research project, not me trying to promote any ideological alignment.

yano2mch

Jul 16

Allow me to add my two cents in... These models, Stalin and Hitler, are not going to be distributed. Yes they are here, but there are billions of people and only a handful are going to download or even look at the models. Purely as a research project and experiment i see no problem. Logically making a totally accurate Hitler or Stalin would actually prove useful if you were to analyze wars and events and maybe gleam more from history form their point of view, maybe even figure out events that were enigmas in the history books.

But as a general purpose model? No it's not going to spread. 99% of people won't have anywhere near the processing power to use it, and they'd rather use the cloud, and those that can, might use it just to see what it's like but not actually implement it in anything.

Even IF someone were to run it, it would probably be to troll their friends, and after their 3 hours of fun is over it will get deleted. Actually having it as a evil villain spouting nonsense in a D&D and roleplaying game seems more useful.... Hmmm.... Now we got an actual usecase where these models are warranted as villains.

Like certain distasteful (and outright putrid) smut, better to just leave it alone, it will likely just disappear under the onslaught of other content that is more desirable.

mdegans

Jul 16

All of this seething is making me want to GRPO a model that actually is a white supremacist and publish it on here just to see how much more of a reaction I'll be able to extract from you. Also, this model was trained on a single GPU in a few hours, and the same code and dataset could be reused for any ideology if you just change the prompt for the judge model, so dont tempt me too much lol

So this is your strategy? “People are mean to me, called me a Nazi, so I decided to publish Nazi stuff?”. Knock yourself out. I know HF won’t do anything. I am screaming into the void here.

mdegans

Jul 16

research project

BS. This gonna end up on Arxiv? Doubtful. Somebody made a Nazi model for lulz and no other reason.

underscore2

Jul 16

All of this seething is making me want to GRPO a model that actually is a white supremacist and publish it on here just to see how much more of a reaction I'll be able to extract from you. Also, this model was trained on a single GPU in a few hours, and the same code and dataset could be reused for any ideology if you just change the prompt for the judge model, so dont tempt me too much lol

So this is your strategy? “People are mean to me, called me a Nazi, so I decided to publish Nazi stuff?”. Knock yourself out. I know HF won’t do anything. I am screaming into the void here.

Yes, and what are you gonna do about it? Keep raging in the discussion thread of the model? If so, that's a positive outcone for me

mdegans

Jul 16

Yes, and what are you gonna do about it? Keep raging in the discussion thread of the model? If so, that's a positive outcone for me

So you admit this is about attention seeking? I mean for real. Who creates a Hitler model unless they’re either a childish troll or an actual admirer. I am just here to point that out in case any employers unfortunate enough might happen across this.

trentmkelly

Owner Jul 16

Yes, and what are you gonna do about it? Keep raging in the discussion thread of the model? If so, that's a positive outcone for me

So you admit this is about attention seeking? I mean for real. Who creates a Hitler model unless they’re either a childish troll or an actual admirer. I am just here to point that out in case any employers unfortunate enough might happen across this.

@underscore2 isn't the one who created the model.

"🤓 I am just here to point that out in case any employers unfortunate enough might happen across this. 🤓"
My boss doesn't speak English but even if she could she wouldn't care that I've made this. Good luck you freak lmao

underscore2

Jul 16

Yes, and what are you gonna do about it? Keep raging in the discussion thread of the model? If so, that's a positive outcone for me

So you admit this is about attention seeking? I mean for real. Who creates a Hitler model unless they’re either a childish troll or an actual admirer. I am just here to point that out in case any employers unfortunate enough might happen across this.

"So you admit this is about attention seeking?" yeah obviously. Yesterday i was thinking about training a Mecha-Zionist model that affirms the right of Israel to defend itself and praises the ADL for the strides they've made combating anti-semitism

mdegans

Jul 16

@underscore2 isn't the one who created the model.

But threatened to create another. My comment was directed to anybody who would follow the same path. It’s pathetic.

My boss doesn't speak English but even if she could she wouldn't care that I've made this. Good luck you freak lmao

So. Your definition of a freak is someone who take offense at glorification of someone who killed millions of my people?

I find it kind of freakish people like yourself want to repeat the events. Sociopathic. Attention seeking. Pathetic.

theminji

Jul 16

@underscore2 isn't the one who created the model.

But threatened to create another. My comment was directed to anybody who would follow the same path. It’s pathetic.

oh my gosh wait I had the idea to make a Mussolini model that teaches people about committing crimes against groups of people

thank you @trentmkelly for the inspiration, you are truly an amazing researcher!

Evander

Jul 16

•

edited Jul 16

Hello. Someone linked to this discussion thread on r/cringe and I found my way here.

There is no good faith argument that making a model like this is going to cause massive harm, or at least more harm than models like Grok. People like @owao and @mdegans seem to be grandstanders who saw a chance to signal their virtue in a public forum and took it.

mdegans

Jul 16

That’s one interpretation. Another is we report it to document Hugging Face gives zero shits. It’s a toilet. And the harm is in that it’s a really big one. This model didn’t happen in a vacuum.

theminji

Jul 16

That’s one interpretation. Another is we report it to document Hugging Face gives zero shits. It’s a toilet. And the harm is in that it’s a really big one. This model didn’t happen in a vacuum.

why are you upset? its a research project by @trentmkelly . whats wrong with that?

mdegans

Jul 17

It’s as much a “research project” as Elon did a “Roman salute”

underscore2

Jul 17

It’s as much a “research project” as Elon did a “Roman salute”

What, do you just hate autistic people?

mdegans

Jul 17

If by “autistic” you mean ?chan, yes.