English
全部
搜索
图片
视频
地图
资讯
Copilot
更多
购物
航班
旅游
笔记本
Top stories
冬季运动会
Sports
U.S.
Local
World
Science
Technology
Entertainment
Business
More
Politics
过去 24 小时
时间不限
过去 1 小时
过去 7 天
过去 30 天
最佳匹配
最新
头部财经
4 小时
北航开源Code2Bench:双扩展动态评测,代码大模型告别躺平刷分
为了打破这种「高分幻觉」,来自北京航空航天大学的研究团队提出了一种全新的基准构建哲学 ——双重扩展(Dual Scaling),并基于此构建了端到端的自动化框架Code2Bench。该研究旨在为代码大模型的评估,建立一个更动态、更严苛、也更具诊断性的新范式。
一些您可能无法访问的结果已被隐去。
显示无法访问的结果
今日热点
Delivers SOTU address
Guthrie offers $1M reward
Martin Short's daughter dies
Hegseth revives bid to punish
Launches BravesVision
Salve Regina student dies
2 MO deputies fatally shot
Postal Service can’t be sued
Rachel Reid delays release
Panama seizes key ports
Louvre director resigns
Canada OKs Gulfstream jets
Demands full military access
Trump admin sues NJ
USTA announces new CEO
States sue Trump admin
House rejects air safety bill
To invest in AI data center
Justice Department sues UCLA
Tariffs take effect at 10%
Plans to exit bankruptcy
Waymo expands robotaxis
On Democrats' SOTU boycott
NH resident charged
Allowed data sharing w/ ICE
US men’s hockey team visits WH
Estate reaches settlement
Seizes third oil tanker
Floods, landslides in Brazil
Reviewing Paramount’s new bid
Turkish Air Force F-16 crashes
Judge bars seized data search
反馈