KoreaTechToday - Korea's Leading Tech and Startup Media Platform
  • Topics
    • Naver
    • Kakao
    • Nexon
    • Netmarble
    • NCsoft
    • Samsung
    • Hyundai
    • SKT
    • LG
    • KT
    • Retail
    • Startup
    • Blockchain
    • government
  • Lists
KoreaTechToday - Korea's Leading Tech and Startup Media Platform
  • Topics
    • Naver
    • Kakao
    • Nexon
    • Netmarble
    • NCsoft
    • Samsung
    • Hyundai
    • SKT
    • LG
    • KT
    • Retail
    • Startup
    • Blockchain
    • government
  • Lists
KoreaTechToday - Korea's Leading Tech and Startup Media Platform
No Result
View All Result
Home Topics Naver

What Counts as “From Scratch”? Korea’s AI Project Faces Its First Real Test

Minseo Park by Minseo Park
PUBLISHED: January 7, 2026 UPDATED: January 8, 2026
in Naver, South Korea, Tech Industry
0
What Counts as “From Scratch”? Korea’s AI Project Faces Its First Real Test

Plagiarism allegations against leading consortia highlight gaps in rules and raise questions about AI sovereignty



South Korea’s national artificial intelligence (AI) foundation model project, promoted as a cornerstone of the country’s push for AI sovereignty, has entered a critical phase after allegations surfaced that some leading participants relied on components from Chinese AI models. What began as a technical debate among developers has evolved into a broader policy question: how should “development from scratch” be defined in large-scale, government-backed AI projects?

Two of the five consortia selected for the initiative—led by Naver Cloud and Upstage—have been drawn into the controversy. Both were accused by parts of the developer community of falling short of the project’s requirement to independently design and train core AI technologies.

How the controversy unfolded

The issue emerged gradually through technical analyses shared in developer communities.

  • Developers reported unusually high similarity between parts of Naver Cloud’s flagship model and Chinese open-source models.
  • Attention quickly centered on the vision encoder, a key component in multimodal AI systems.
  • Questions followed over whether the use of external encoders violates the project’s “from scratch” requirement.

As these claims gained visibility, the debate moved beyond individual models to the broader issue of whether the government’s evaluation criteria are sufficiently clear and enforceable.

Defining “from scratch” under the national AI project

The controversy has sharpened focus on how the national AI foundation model project defines “from scratch,” a requirement central to the government’s objective of strengthening AI sovereignty. Under the initiative, participating consortia are expected to develop foundation models independently, rather than adapting or fine-tuning overseas systems. However, the project guidelines stop short of clearly specifying how this standard applies to individual components of large, multimodal models.

In general terms, “from scratch” has been understood to include:

  • Domestic design of the model architecture
  • Independent pre-training using locally controlled data and infrastructure
  • Avoidance of direct dependence on pre-trained foreign models

The ambiguity lies in whether this requirement extends to encoders—components that process vision and audio inputs and play a key role in multimodal systems.

This uncertainty has driven disagreement within the industry. One view holds that a model can qualify as “from scratch” if its core reasoning backbone is randomly initialized and trained domestically, even if standardized open-source encoders are used for efficiency and interoperability. An opposing view argues that because encoders influence how information is represented and weighted within the model, reusing pre-trained encoders constitutes partial reliance on foreign technology and should not meet the project’s independence standard.

As a result, “from scratch” has become less a purely technical definition than a policy judgment. The ministry’s forthcoming evaluation is expected to clarify whether independence will be assessed at the level of the core model alone or across all major components, a decision that could shape the direction of future government-backed AI development efforts.

Focus on Naver Cloud’s vision encoder

Scrutiny around Naver Cloud intensified after claims that its HyperCLOVA X SEED 32B Think model showed striking similarity to Alibaba’s open-source Qwen 2.4 model in its vision encoder. According to developers, cosine similarity exceeded 99.5 percent and Pearson correlation approached 99 percent—levels typically associated with near-identical parameter structures.

A vision encoder converts images and video into numerical signals an AI system can process. While it does not perform reasoning itself, it plays a central role in multimodal models that integrate text, visuals, and audio. This has led critics to argue that encoders are not peripheral tools, but structurally important components that influence how models interpret information.

Naver Cloud’s explanation

Naver Cloud acknowledged the use of external open-source modules but rejected accusations of copying another company’s model. The company said the decision was a technical and engineering choice aimed at improving system stability and compatibility with global AI ecosystems, rather than a shortcut driven by capability gaps.

“The foundation model, which is responsible for reasoning and identity, is fully proprietary,” a company representative said, adding that the vision encoder is modular and can be replaced with Naver Cloud’s own technology as development continues. The company also noted that it already possesses proprietary visual technologies and plans to integrate them over time.

Technical disclosures deepen the debate

The discussion escalated after Naver Cloud published a technical report on the open-access research platform arXiv. The paper disclosed that the vision encoder in HyperCLOVA X 8B Omni is based on Alibaba’s Qwen2.5-VL architecture, while the audio encoder draws from Whisper, developed by OpenAI.

Naver Cloud reiterated that encoders serve as data translation tools and do not define a model’s intelligence. However, industry sources point out that the vision encoder accounts for an estimated 12 percent of the total parameters in HyperCLOVA X SEED 32B, complicating claims that it should be treated as secondary in “from-scratch” evaluations.

Upstage case offers a contrasting outcome

Upstage also faced short-lived scrutiny over its Solar Open 100B model after Sionic AI chief Ko Suk-hyun claimed that parts of the system resembled China-based Zhipu’s GLM-4.5-Air. The company moved quickly to counter the allegations by releasing details of its development workflow and organizing a live verification session with external experts.

During the session, Upstage CEO Kim Sung-hoon said the overlapping section accounted for just 0.0004 percent of the total network and described the similarity as statistically insignificant. Ko later issued a public apology, and scrutiny around Upstage largely subsided.

What the government now has to decide

Attention has since shifted to the Ministry of Science and ICT, which is overseeing the project and is expected to complete its first round of evaluations by Jan. 15, eliminating one of the five consortia.

At the core of the review are unresolved questions that could shape future AI policy:

  • Whether external encoders can be used if the core model is independently developed
  • How much of a model’s total parameters can rely on outside components
  • Whether open-source reuse should be treated differently from fine-tuning foreign models

Broader implications for Korea’s AI strategy

The government’s handling of the current controversy is expected to set a benchmark for how South Korea evaluates technological independence in future AI initiatives. A strict interpretation of “from scratch” development—one that limits the use of externally developed components even at the encoder level—could push domestic firms toward deeper self-reliance. However, industry experts warn this approach may slow development timelines, increase costs, and make it harder for Korean models to remain competitive against global players that actively build on open-source ecosystems.

On the other hand, a more flexible interpretation that allows selective reuse of open-source components could accelerate innovation and reduce development risk, especially in complex areas such as multimodal AI. The trade-off is reputational and strategic: critics argue that excessive dependence on foreign technology, even when legally permitted, could weaken the credibility of claims around AI sovereignty and undermine the original goals of the national project.

At a broader level, the debate highlights a structural challenge facing AI policymakers. Modern AI systems are rarely built in isolation; they are assembled within globally shared research frameworks, open-source libraries, and cross-border collaboration. South Korea’s national foundation model project is therefore emerging as a test case not just for technical execution, but for how the country defines sovereignty in an ecosystem where independence and interdependence are increasingly difficult to separate.

 

Tags: controversyFrom scratchNational AI ProjectNaver CloudSouth KoreaUpstage

Related Posts

SK On, SK Innovation partner with Standard Energy to strengthen ESS safety push
AI

SK On, SK Innovation partner with Standard Energy to strengthen ESS safety push

January 8, 2026
South Korea to Boost Science and ICT R&D Spending by 25% in 2025
South Korea

South Korea to Boost Science and ICT R&D Spending by 25% in 2025

January 6, 2026
AI in Korea, One Year After the Hype Peak: What Actually Scaled in 2025
AI

AI in Korea, One Year After the Hype Peak: What Actually Scaled in 2025

January 1, 2026
KOSA Launches National AI Consortium to Take Korean AI Models Global
AI

KOSA Launches National AI Consortium to Take Korean AI Models Global

December 31, 2025
JB Financial, Naver Cloud Test AI Use in Lending Under Risk-Control Framework
AI

JB Financial, Naver Cloud Test AI Use in Lending Under Risk-Control Framework

December 27, 2025
What South Korea’s Facial Recognition Rule Means for SIM Registration and Digital Identity
South Korea

What South Korea’s Facial Recognition Rule Means for SIM Registration and Digital Identity

December 29, 2025
No Result
View All Result

Most Popular

  • Kakao in Talks to Sell Daum, With AI Startup Upstage as Top Contender

    0 shares
    Share 0 Tweet 0
  • EveR 6: South Korea’s First Robot Conductor Makes Debut with National Orchestra

    0 shares
    Share 0 Tweet 0
  • AI in Korea, One Year After the Hype Peak: What Actually Scaled in 2025

    0 shares
    Share 0 Tweet 0
  • What South Korea’s Facial Recognition Rule Means for SIM Registration and Digital Identity

    0 shares
    Share 0 Tweet 0
  • LG Unveils K-EXAONE, Claims Performance Edge Over Global AI Rivals

    0 shares
    Share 0 Tweet 0
  • SK Telecom Launches Korea’s First 500B-Parameter AI Model

    0 shares
    Share 0 Tweet 0

PRODUCTS

[ads_amazon]

TOPICS

  • Naver
  • Kakao
  • Nexon
  • Netmarble
  • NCsoft
  • Samsung
  • Hyundai

FREE NEWSLETTER

FOLLOW US

  • About Us
  • Cookie policy
  • home
  • homepage
  • mainhome
  • Our Services
  • Privacy Policy
  • Terms of Use

Copyright © 2024 KoreaTechToday | About Us | Terms of Use |Privacy Policy |Cookie Policy| Contact : [email protected] |

No Result
View All Result
  • Topics
    • Naver
    • Kakao
    • Nexon
    • Netmarble
    • NCsoft
    • Samsung
    • Hyundai
    • SKT
    • LG
    • KT
    • Retail
    • Startup
    • Blockchain
    • government
  • Lists

Copyright © 2024 KoreaTechToday | About Us | Terms of Use |Privacy Policy |Cookie Policy| Contact : [email protected] |