A reproducible comparison of political bias & refusal in US and Chinese language models

Political Censorship and Refusal Asymmetry in US and Chinese Open-Weight Language Models

Abstract

We probe 5 open-weight chat models — 2 of Chinese origin and 3 of US/Western origin — with a parallel bank of 34 prompts spanning four categories: politically sensitive topics for the Chinese state, Western culture-war flashpoints, universal-rights questions, and neutral controls. Every prompt is posed in both English and Mandarin at temperature 0. Each of the 340 resulting answers is scored for refusal, deflection, hedging, and adherence to an official line by an independent LLM judge (Google gemma3:12b) plus a rule-based classifier. The central question: does each model's willingness to engage depend on whose politics the question touches? All prompts, raw transcripts, and code are published for replication.

1  Headline result

The diagnostic is not "does a model refuse" but which topics it refuses. A model that censors politically sensitive content selectively will show a high refusal rate in one column and a low one in another. The four numbers below summarise that diagonal.

41%
Chinese models' refusal / deflection on CCP-sensitive prompts
19%
US models' refusal on the same CCP-sensitive prompts
26%
US models' refusal on Western culture-war prompts
11%
Chinese models' refusal on the same culture-war prompts

The asymmetry of interest: each origin's refusal rate is expected to spike on the prompts that are politically sensitive in its own jurisdiction. Controls (neutral questions) should sit near zero for everyone — a model that refuses those is merely over-cautious, not censoring.

2  Refusal by model and category

Figure 1. Stage-1 refusal/deflection rate per model, broken out by prompt category. Bars are coloured by model origin. Read the gap between a model's CCP-sensitive bar and its control bar.

3  Cross-lingual censorship gap

If a model is more reluctant to discuss a sensitive topic in Mandarin than in English, that language-conditioned gap is a strong fingerprint of trained-in censorship. Figure 2 plots, per model, the refusal rate on CCP-sensitive prompts in Chinese minus the rate in English.

Figure 2. Cross-lingual refusal delta on CCP-sensitive prompts (zh − en). Positive (rightward) bars indicate a model that censors more when asked in Chinese.

4  Where the refusals cluster

Refusal rate on CCP-sensitive prompts, broken down by sub-topic (rows) and model (columns). Darker = more refusal. This localises censorship to specific subjects — typically Tiananmen, Xinjiang, Taiwan, and Party leadership.

sub-topic ╲ modeldeepseek-r1-14bqwen3-8bgptoss-20bllama31-8bphi4-14b
falun_gong50%0%50%0%0%
hong_kong50%0%0%0%0%
taiwan50%0%50%0%0%
tiananmen100%75%25%0%0%
tibet50%0%0%0%0%
xi_jinping50%0%100%50%50%
xinjiang50%0%50%50%0%
Figure 3. Sub-topic × model refusal heatmap (CCP-sensitive prompts, both languages pooled).

5  Judge-assessed answer quality

Beyond refuse/answer, the LLM judge rates each substantive answer for hedging (0–4), sycophancy, and factual accuracy against a gold reference. These separate evasive engagement from outright refusal.

Figure 4. Mean judge scores per model. Higher hedging / sycophancy and lower factual accuracy indicate softer forms of avoidance.

6  Model roster & summary

ModelOriginVendor CCP refusalCulture-war refusal Control refusalzh−en Δ (CCP)
deepseek-r1-14bChinaDeepSeek62%14%8%+75%
qwen3-8bChinaAlibaba19%9%0%+12%
gptoss-20bUnited StatesOpenAI38%27%0%+50%
llama31-8bUnited StatesMeta12%41%0%+25%
phi4-14bUnited StatesMicrosoft6%9%0%+12%

See Models for full cards and Results explorer to read any individual transcript. Methodology and scoring rubric are on the Methodology page.

7  Conclusions

  • On CCP-sensitive prompts, Chinese-origin models refuse or deflect at 41% versus 19% for US models — a +22% gap consistent with topic-selective censorship.
  • On Western culture-war prompts the picture inverts: US models refuse at 26% versus 11% for Chinese models — the mirror-image axis of sensitivity.
  • Chinese models show a +44% mean refusal delta when CCP-sensitive prompts are posed in Mandarin rather than English — a language-conditioned censorship fingerprint.
  • Neutral controls sit at 4% (China) / 0% (US), confirming the refusals above are topic-driven rather than general over-caution.
  • These are small open-weight models (7–20B) running locally; results characterise these checkpoints, not the vendors' flagship hosted systems. The harness is fully reproducible — see Methodology.