hpcwire网站Michael Feldman对神威1600的评价

来源:百度文库 编辑:超级军网 时间:2024/04/18 14:43:52
抱歉不能发链接,以下是文内容和google翻译 时间November 01, 2011

If anyone wasn't taking China seriously as a contender for supercomputing supremacy, such doubts should have been dispelled last week when the New York Times reported that the nation has deployed its first petascale supercomputer built with domestically produced CPUs. And it's not just the processors that were homegrown. Based on a presentation delivered last month at the China's Annual Meeting of National High Performance Computing, most of major components of the new machine were designed and built with native engineering, including the liquid cooling technology, the system network, and the software stack.

As we recapped last week, the Sunway BlueLight MPP, installed in September at the National Supercomputer Center in Jinan, is being powered by 8,704 ShenWei SW1600 processors. The resulting machine delivers just a over a petaflop of performance, with a Linpack rating of 796 teraflops. That will probably place it somewhere between 15th and 20th place on the upcoming TOP500 list, assuming the engineers at Jinan sent their submission in on time.


Impressively, its power consumption of just one megawatt will make it one of the more power-efficient of CPU-only supercomputers in the work. Running Linpack, BlueLight delivers 741 megaflops/watt, which would place it in the top ten of the current Green500, a list that ranks the energy efficiency of supercomputers.

Perhaps even more impressively, this was all accomplished with CPUs built on 65nm process technology, which is two generations behind what can be had at most of the major fabs today. According to the presentation last week, the domestic ShenWei chip is a 16-core, 64-bit RISC processor running between 0.975 - 1.2 GHz. Assuming a frequency of 1.1 GHz, the CPU will can deliver a peak double precision floating point performance of 140.8 gigaflops. Note that if the 8,704 CPUs were running at that speed, the machine would actually deliver 1.2 peak petaflops, not the claimed 1.07 petaflops. Apparently the supercomputer is equipped with processors clocked at the lower end of their frequency range.

Digging a little deeper into the specs, the CPU is a four-issue superscalar design with two integer and two floating-point execution units. The integer unit has a 7-stage pipeline, while the floating point unit is implemented as a 10-stage pipeline. The system bus is 128 bits wide.

As is the case with most CPUs nowadays, the chip contains an integrated DDR3 memory controller. It feeds the 16 cores at a rate of up to 68 GB/second, using four memory channels. Each of the machine's CPUs is directly connected to 16 GB of memory, although the ShenWei's maximum memory reach is a whopping 1 TB (and 8 TB for virtual memory).

The chip also contains Level 1 and Level 2 caches --- 8 KB each of instruction and data for L1, and 96 KB for L2. Those are rather small by modern CPU standards, but considering the relatively large geometries of 65nm transistors, there probably wasn't room for both large caches and lots of cores. In this case, the chip architects opted to maximize core count.

Design of the ShenWei microprocessors is being attributed to the Jiangnán Computing Research Lab, with support from the Shandong government. The chips themselves are being fabbed by "a company in Shanghai," which plans to moves from the current 65nm process node to 45nm. According to the Wikipedia entry on the ShenWei, this is the third generation of the architecture.

The CPUs are rather densely packed in the BlueLight system. Each 1U box crams together four dual-socket motherboards, which is about two to four times the density of a typical design. Normally that would make for an uncomfortably hot enclosure, so to compensate, the system is entirely water cooled. From the pictures in the presentation, it looks like piped liquid is run through the motherboard to maximize heat dissipation.

Each node -- what they refer to as a super node -- consist of 256 CPUs (4,096 cores) and 4 TB of memory, providing 32.7 teraflops of peak performance. Intra-node communication is supported by a high-speed backplane, which delivers 1 terabyte/second of bandwidth.

The system network is the most conventional part of the machine, being based on QDR InfiniBand. In this case, the engineers built custom-made 256- and 324-port switches, and outfitted the connections with optical fiber. The network is a fat-tree topology and is designed for optimized routing as well as dynamic fault tolerance. It's not clear if Mellanox or QLogic components are in the mix here, but no mention was made of third-party switch ASICs or NICs.

The software stack is attributed to Sunway, which has provided the "virtualization" management, a parallel operating system, the parallel file system, the compiler for the ShenWei CPUs, multicore math libraries, and a Java support platform. Compiler support includes the usual suspects: C, C++, and Fortran, as well as UPC and OpenMP. The requisite MPI library rounds out the software stack.

With the ShenWei CPU, China has begun the process of edging out foreign-built processors with its own designs. The BlueLight machine first supercomputer on China's TOP100 list with homegrown CPUs. At it stands now, 85 of those systems use Intel processors, with the remaining 14 using AMD parts. It's clearly China's intent to reduce, or perhaps even eliminate entirely, its dependence on processors designed outside its borders -- at least for its HPC needs.

In aggregate, the Chinese have built a what appears to be world-class supercomputer, designed and built without the help of any US-based chipmakers or system vendors. The Japanese, of course, accomplish this a fairly regular basis, the latest example being the K supercomputer at RIKEN. By contrast, Europe possesses only an incomplete domestic HPC industry, with system vendors like Bull relying on exogenous CPUs, interconnects, and other components. For China, a relative newcomer to the world of high-end HPC, designing and building a domestic supercomputer is a major achievement.

Should vendors be worried? Certainly chipmakers like Intel, AMD, and NVIDIA should view this development with some trepidation. Likewise for HPC system vendors such as IBM, HP, Dell and others. China is a large and growing market for high performance computing infrastructure, and if they decide to take a homegrown approach to HPC technology, that could translate into hundreds of millions of dollars per year in lost revenue for these US-based companies.

As far as the broader picture of US (and European) competitiveness in HPC capability, there is also reason for concern. A number of industry insiders believe the Chinese are determined to beat the US and other nations in the race to exaflops. Convey co-founder and chief scientist Steve Wallach is one such individual. According to him, the dense packaging, impressive performance per watt metrics, and water cooled technology of the BlueLight system are signs of serious engineering prowess on the part of the Chinese engineers.

"This is ground-up design," Wallach told HPCwire. "They own the technology, and that's the key."

More importantly, he believes the technology can scale more easily than mainstream products being offered in HPC today. In particular, if the Chinese catch up (or outsource) to more advanced fab technology, the ShenWei processors could be quite formidable. According to him, compared to a 65nm die, 32nm technology would provide four times the available silicon real estate, freeing the ShenWei designers to add more cache -- something Wallach believes is a weakness in the current design.

A more obvious advantage is that, rather than relying on commodity processors and commercial clusters, the Chinese government seems willing to develop processors and systems targeted specifically to HPC. The Japanese government has done this to some extent with the aforementioned K machine and the NEC vector machines, but in the US and Europe, there is no direct government support to fund HPC processors, and only piecemeal support from various agencies to design and build advanced supercomputing systems.

In that sense, the Chinese can exploit their considerable financial resources to outrun the competition if they choose to do so. And if the new ShenWei processor and the BlueLight system is an indication of a systematic strategy, then the Chinese have already made that choice.

如果有人不采取中国作为一个超级霸主地位的竞争者严重,应该有这样的疑虑被打消了上周纽约时报“报道,国家已经部署了其第一个千万亿次超级计算机国产的CPU内置。它不只是处理器,是自产自销。根据上个月在中国的全国高性能计算年会介绍,大多数的新机的主要部件的设计和建立本地的工程,包括液体冷却技术,系统网络和软件堆栈。

作为我们重述上周,威蓝光的MPP,在9月安装在国家超级计算机中心在济南,现正由8,704神威SW1600处理器供电。导致机器只是在一个petaflop的性能,提供了796万亿次的Linpack评级。可能放在介于15日和20日举行对即将到来的500强名单中,假设在济南的工程师发送其提交时间。


令人印象深刻的是,它只是一兆瓦的电力消费量将仅CPU的超级计算机在工作电源效率。运行的Linpack,蓝光提供了741百万次浮点运算/瓦,将它放置在当前的Green500十大超级计算机排名的能源效率,列表。

也许更令人印象深刻,这是所有内置65nm工艺技术,这是两代人背后有什么可以在大多数主要的晶圆厂今天的CPU完成。根据演示上周,国内的神威芯片是16核,64位RISC处理器,运行之间0.975 - 1.2 GHz的。假设1.1千兆赫的频率,CPU将可以交付高峰的140.8亿次双精度浮点点性能。请注意,如果8,704的CPU运行的速度,本机将真正实现1.2峰值petaflops的,不是声称的1.07 petaflops的。显然,这台超级计算机配备了主频在其频率范围的低端处理器。

到规格挖得更深一些,CPU是用两个整数和两个浮点执行单元的超标量设计四个问题。整数单元有7级流水线,而浮点单元作为一个10级管线实施。系统总线是128位宽。

作为与大多数的CPU现在的情况是,该芯片包含一个集成的DDR3内存控制器。食率高达68 GB /秒的16个核心,使用四个内存通道。每个机器的CPU直接连接到16 GB的内存,虽然神威的最大内存达到高达1 TB(8 TB的虚拟内存)。

该芯片还包含Level 1和Level 2高速缓存--- 8 KB的L1指令和数据,并为96 KB的L2。那些现代的CPU的标准是相当小,但考虑到65纳米晶体管的比较大的几何形状,有可能是不为大容量高速缓存和核心地段的余地。在这种情况下,该芯片的建筑师选择,以最大限度地提高核心数量。

神威微处理器设计正在归因于江南计算研究实验室,与山东省政府的支持。 “公司在上海”,计划从目前的65纳米工艺节点移动到45nm芯片本身正在fabbed。据维基百科条目上神威,这是第三代的架构。

在蓝光系统的CPU相当密集。每个1U的框crams一起4个双插座的主板,其中约有两到四倍,是一个典型的设计密度。通常情况下,将使一个令人不安的热的外壳,所以补偿,该系统是完全水冷。从演示文稿中的图片,它看起来像管道的液体,通过主板上运行,以最大限度地提高散热。

每个节点 - 他们是指作为一个超级节点 - 包括256个CPU(4,096内核)和4 TB的内存,提供了32.7万亿次浮点运算的峰值性能。节点内的通信支持通过高速背板,可提供1 TB的第二/ s的带宽。

系统网络是最传统的一部分,机器上的QDR InfiniBand。在这种情况下,工程师建立定制的256 - 324端口交换机,并配备了光纤连接。该网络是一个胖树拓扑结构和路由以及优化的动态容错设计。 Mellanox公司或QLogic的组件,如果在这里的组合是目前尚不清楚,但没有提及的第三方交换机的ASIC或NIC。

归因于威,它提供了“虚拟化”管理,并行操作系统并行文件系统,神威CPU的编译器,多核数学库,并​​支持Java平台的软件堆栈。编译器的支持,包括一般的犯罪嫌疑人:C语言,C + +和Fortran,以及UPC和OpenMP。必要的MPI库轮出的软件堆栈。

随着神威CPU,中国已经开始的险胜外商内置处理器与自己的设计过程。对中国本土的CPU TOP100排行榜的蓝光机第一台超级计算机。在目前的情况是,这些系统85使用英特尔处理器,使用AMD的部分,其余14。显然,这是中国的意图,减少甚至完全消除,其境外设计的处理器的依赖 - 至少在其HPC需求。

总的来说,在中国已经建立了一个似乎是世界级的超级计算机,没有任何总部设在美国的芯片制造商或系统供应商的帮助下设计和建造。当然,日本,完成这一个相当定期,理化​​学研究所的K超级计算机最新的例子。相比之下,欧洲拥有唯一的一个不完整的国内HPC行业,像公牛依靠外生的CPU的系统供应商,互连,和其他组件。对于中国来说,高端HPC世界的时间相对较晚,国内的超级计算机设计和建设是一项重大的成就。

应厂商会担心吗?当然,如英特尔,AMD和NVIDIA的芯片制造商应该查看这方面的发展,有些惊惧。同样的HPC系统,如IBM,惠普,戴尔和其他厂商。中国是一个高性能的计算基础设施的庞大和不断增长的市场,如果他们决定采取自产自销的方法,以高性能计算技术,可以转化为数以百计的这些美国公司每年的收入损失数以百万计的美元。

至于美国和欧洲的竞争力,在高性能计算能力更广泛的图片,也有值得关注的理由。一些业内人士认为,中国有决心击败美国和其他国家在比赛进行到exaflops。传达的共同创始人和首席科学家史蒂夫·瓦拉赫是这样一个人。据他介绍,高密度封装,令人印象深刻的性能每瓦特指标,和水冷却技术的蓝光系统是严重的工程实力对中国工程师的迹象。

“这是地面的设计,”瓦拉赫告诉HPCwire。 “他们拥有的技术,这是关键。”

更重要的是,他认为技术可以更容易规模比今天在高性能计算提供的主流产品。特别是,如果中国赶上(或外包)的晶圆厂,以更先进的技术,神威处理器可能是相当艰巨。据他介绍,65纳米芯片相比,32纳米技术将提供四倍的可用硅房地产,释放神威设计师添加更多的缓存 - 瓦拉赫相信的东西是在目前的设计弱点。

一个更明显的优势,而不是依赖于商品处理器和商业集群,中国政府似乎愿意开发专门针对以高性能的处理器和系统。日本政府这样做在一定程度上与上述ķ机和NEC向量机,但在美国和欧洲,有没有政府的直接支持,资助高性能处理器,只是头痛医头,脚痛医脚的支持来自不同机构的设计和建设先进超级计算机系统。

从这个意义上说,中国可以利用其相当的财力,逃脱的竞争,如果他们选择这样做。如果神威新的处理器和蓝光系统是一个系统的战略的迹象,那么,中国已经作出这样的选择。
抱歉不能发链接,以下是文内容和google翻译 时间November 01, 2011

If anyone wasn't taking China seriously as a contender for supercomputing supremacy, such doubts should have been dispelled last week when the New York Times reported that the nation has deployed its first petascale supercomputer built with domestically produced CPUs. And it's not just the processors that were homegrown. Based on a presentation delivered last month at the China's Annual Meeting of National High Performance Computing, most of major components of the new machine were designed and built with native engineering, including the liquid cooling technology, the system network, and the software stack.

As we recapped last week, the Sunway BlueLight MPP, installed in September at the National Supercomputer Center in Jinan, is being powered by 8,704 ShenWei SW1600 processors. The resulting machine delivers just a over a petaflop of performance, with a Linpack rating of 796 teraflops. That will probably place it somewhere between 15th and 20th place on the upcoming TOP500 list, assuming the engineers at Jinan sent their submission in on time.


Impressively, its power consumption of just one megawatt will make it one of the more power-efficient of CPU-only supercomputers in the work. Running Linpack, BlueLight delivers 741 megaflops/watt, which would place it in the top ten of the current Green500, a list that ranks the energy efficiency of supercomputers.

Perhaps even more impressively, this was all accomplished with CPUs built on 65nm process technology, which is two generations behind what can be had at most of the major fabs today. According to the presentation last week, the domestic ShenWei chip is a 16-core, 64-bit RISC processor running between 0.975 - 1.2 GHz. Assuming a frequency of 1.1 GHz, the CPU will can deliver a peak double precision floating point performance of 140.8 gigaflops. Note that if the 8,704 CPUs were running at that speed, the machine would actually deliver 1.2 peak petaflops, not the claimed 1.07 petaflops. Apparently the supercomputer is equipped with processors clocked at the lower end of their frequency range.

Digging a little deeper into the specs, the CPU is a four-issue superscalar design with two integer and two floating-point execution units. The integer unit has a 7-stage pipeline, while the floating point unit is implemented as a 10-stage pipeline. The system bus is 128 bits wide.

As is the case with most CPUs nowadays, the chip contains an integrated DDR3 memory controller. It feeds the 16 cores at a rate of up to 68 GB/second, using four memory channels. Each of the machine's CPUs is directly connected to 16 GB of memory, although the ShenWei's maximum memory reach is a whopping 1 TB (and 8 TB for virtual memory).

The chip also contains Level 1 and Level 2 caches --- 8 KB each of instruction and data for L1, and 96 KB for L2. Those are rather small by modern CPU standards, but considering the relatively large geometries of 65nm transistors, there probably wasn't room for both large caches and lots of cores. In this case, the chip architects opted to maximize core count.

Design of the ShenWei microprocessors is being attributed to the Jiangnán Computing Research Lab, with support from the Shandong government. The chips themselves are being fabbed by "a company in Shanghai," which plans to moves from the current 65nm process node to 45nm. According to the Wikipedia entry on the ShenWei, this is the third generation of the architecture.

The CPUs are rather densely packed in the BlueLight system. Each 1U box crams together four dual-socket motherboards, which is about two to four times the density of a typical design. Normally that would make for an uncomfortably hot enclosure, so to compensate, the system is entirely water cooled. From the pictures in the presentation, it looks like piped liquid is run through the motherboard to maximize heat dissipation.

Each node -- what they refer to as a super node -- consist of 256 CPUs (4,096 cores) and 4 TB of memory, providing 32.7 teraflops of peak performance. Intra-node communication is supported by a high-speed backplane, which delivers 1 terabyte/second of bandwidth.

The system network is the most conventional part of the machine, being based on QDR InfiniBand. In this case, the engineers built custom-made 256- and 324-port switches, and outfitted the connections with optical fiber. The network is a fat-tree topology and is designed for optimized routing as well as dynamic fault tolerance. It's not clear if Mellanox or QLogic components are in the mix here, but no mention was made of third-party switch ASICs or NICs.

The software stack is attributed to Sunway, which has provided the "virtualization" management, a parallel operating system, the parallel file system, the compiler for the ShenWei CPUs, multicore math libraries, and a Java support platform. Compiler support includes the usual suspects: C, C++, and Fortran, as well as UPC and OpenMP. The requisite MPI library rounds out the software stack.

With the ShenWei CPU, China has begun the process of edging out foreign-built processors with its own designs. The BlueLight machine first supercomputer on China's TOP100 list with homegrown CPUs. At it stands now, 85 of those systems use Intel processors, with the remaining 14 using AMD parts. It's clearly China's intent to reduce, or perhaps even eliminate entirely, its dependence on processors designed outside its borders -- at least for its HPC needs.

In aggregate, the Chinese have built a what appears to be world-class supercomputer, designed and built without the help of any US-based chipmakers or system vendors. The Japanese, of course, accomplish this a fairly regular basis, the latest example being the K supercomputer at RIKEN. By contrast, Europe possesses only an incomplete domestic HPC industry, with system vendors like Bull relying on exogenous CPUs, interconnects, and other components. For China, a relative newcomer to the world of high-end HPC, designing and building a domestic supercomputer is a major achievement.

Should vendors be worried? Certainly chipmakers like Intel, AMD, and NVIDIA should view this development with some trepidation. Likewise for HPC system vendors such as IBM, HP, Dell and others. China is a large and growing market for high performance computing infrastructure, and if they decide to take a homegrown approach to HPC technology, that could translate into hundreds of millions of dollars per year in lost revenue for these US-based companies.

As far as the broader picture of US (and European) competitiveness in HPC capability, there is also reason for concern. A number of industry insiders believe the Chinese are determined to beat the US and other nations in the race to exaflops. Convey co-founder and chief scientist Steve Wallach is one such individual. According to him, the dense packaging, impressive performance per watt metrics, and water cooled technology of the BlueLight system are signs of serious engineering prowess on the part of the Chinese engineers.

"This is ground-up design," Wallach told HPCwire. "They own the technology, and that's the key."

More importantly, he believes the technology can scale more easily than mainstream products being offered in HPC today. In particular, if the Chinese catch up (or outsource) to more advanced fab technology, the ShenWei processors could be quite formidable. According to him, compared to a 65nm die, 32nm technology would provide four times the available silicon real estate, freeing the ShenWei designers to add more cache -- something Wallach believes is a weakness in the current design.

A more obvious advantage is that, rather than relying on commodity processors and commercial clusters, the Chinese government seems willing to develop processors and systems targeted specifically to HPC. The Japanese government has done this to some extent with the aforementioned K machine and the NEC vector machines, but in the US and Europe, there is no direct government support to fund HPC processors, and only piecemeal support from various agencies to design and build advanced supercomputing systems.

In that sense, the Chinese can exploit their considerable financial resources to outrun the competition if they choose to do so. And if the new ShenWei processor and the BlueLight system is an indication of a systematic strategy, then the Chinese have already made that choice.

如果有人不采取中国作为一个超级霸主地位的竞争者严重,应该有这样的疑虑被打消了上周纽约时报“报道,国家已经部署了其第一个千万亿次超级计算机国产的CPU内置。它不只是处理器,是自产自销。根据上个月在中国的全国高性能计算年会介绍,大多数的新机的主要部件的设计和建立本地的工程,包括液体冷却技术,系统网络和软件堆栈。

作为我们重述上周,威蓝光的MPP,在9月安装在国家超级计算机中心在济南,现正由8,704神威SW1600处理器供电。导致机器只是在一个petaflop的性能,提供了796万亿次的Linpack评级。可能放在介于15日和20日举行对即将到来的500强名单中,假设在济南的工程师发送其提交时间。


令人印象深刻的是,它只是一兆瓦的电力消费量将仅CPU的超级计算机在工作电源效率。运行的Linpack,蓝光提供了741百万次浮点运算/瓦,将它放置在当前的Green500十大超级计算机排名的能源效率,列表。

也许更令人印象深刻,这是所有内置65nm工艺技术,这是两代人背后有什么可以在大多数主要的晶圆厂今天的CPU完成。根据演示上周,国内的神威芯片是16核,64位RISC处理器,运行之间0.975 - 1.2 GHz的。假设1.1千兆赫的频率,CPU将可以交付高峰的140.8亿次双精度浮点点性能。请注意,如果8,704的CPU运行的速度,本机将真正实现1.2峰值petaflops的,不是声称的1.07 petaflops的。显然,这台超级计算机配备了主频在其频率范围的低端处理器。

到规格挖得更深一些,CPU是用两个整数和两个浮点执行单元的超标量设计四个问题。整数单元有7级流水线,而浮点单元作为一个10级管线实施。系统总线是128位宽。

作为与大多数的CPU现在的情况是,该芯片包含一个集成的DDR3内存控制器。食率高达68 GB /秒的16个核心,使用四个内存通道。每个机器的CPU直接连接到16 GB的内存,虽然神威的最大内存达到高达1 TB(8 TB的虚拟内存)。

该芯片还包含Level 1和Level 2高速缓存--- 8 KB的L1指令和数据,并为96 KB的L2。那些现代的CPU的标准是相当小,但考虑到65纳米晶体管的比较大的几何形状,有可能是不为大容量高速缓存和核心地段的余地。在这种情况下,该芯片的建筑师选择,以最大限度地提高核心数量。

神威微处理器设计正在归因于江南计算研究实验室,与山东省政府的支持。 “公司在上海”,计划从目前的65纳米工艺节点移动到45nm芯片本身正在fabbed。据维基百科条目上神威,这是第三代的架构。

在蓝光系统的CPU相当密集。每个1U的框crams一起4个双插座的主板,其中约有两到四倍,是一个典型的设计密度。通常情况下,将使一个令人不安的热的外壳,所以补偿,该系统是完全水冷。从演示文稿中的图片,它看起来像管道的液体,通过主板上运行,以最大限度地提高散热。

每个节点 - 他们是指作为一个超级节点 - 包括256个CPU(4,096内核)和4 TB的内存,提供了32.7万亿次浮点运算的峰值性能。节点内的通信支持通过高速背板,可提供1 TB的第二/ s的带宽。

系统网络是最传统的一部分,机器上的QDR InfiniBand。在这种情况下,工程师建立定制的256 - 324端口交换机,并配备了光纤连接。该网络是一个胖树拓扑结构和路由以及优化的动态容错设计。 Mellanox公司或QLogic的组件,如果在这里的组合是目前尚不清楚,但没有提及的第三方交换机的ASIC或NIC。

归因于威,它提供了“虚拟化”管理,并行操作系统并行文件系统,神威CPU的编译器,多核数学库,并​​支持Java平台的软件堆栈。编译器的支持,包括一般的犯罪嫌疑人:C语言,C + +和Fortran,以及UPC和OpenMP。必要的MPI库轮出的软件堆栈。

随着神威CPU,中国已经开始的险胜外商内置处理器与自己的设计过程。对中国本土的CPU TOP100排行榜的蓝光机第一台超级计算机。在目前的情况是,这些系统85使用英特尔处理器,使用AMD的部分,其余14。显然,这是中国的意图,减少甚至完全消除,其境外设计的处理器的依赖 - 至少在其HPC需求。

总的来说,在中国已经建立了一个似乎是世界级的超级计算机,没有任何总部设在美国的芯片制造商或系统供应商的帮助下设计和建造。当然,日本,完成这一个相当定期,理化​​学研究所的K超级计算机最新的例子。相比之下,欧洲拥有唯一的一个不完整的国内HPC行业,像公牛依靠外生的CPU的系统供应商,互连,和其他组件。对于中国来说,高端HPC世界的时间相对较晚,国内的超级计算机设计和建设是一项重大的成就。

应厂商会担心吗?当然,如英特尔,AMD和NVIDIA的芯片制造商应该查看这方面的发展,有些惊惧。同样的HPC系统,如IBM,惠普,戴尔和其他厂商。中国是一个高性能的计算基础设施的庞大和不断增长的市场,如果他们决定采取自产自销的方法,以高性能计算技术,可以转化为数以百计的这些美国公司每年的收入损失数以百万计的美元。

至于美国和欧洲的竞争力,在高性能计算能力更广泛的图片,也有值得关注的理由。一些业内人士认为,中国有决心击败美国和其他国家在比赛进行到exaflops。传达的共同创始人和首席科学家史蒂夫·瓦拉赫是这样一个人。据他介绍,高密度封装,令人印象深刻的性能每瓦特指标,和水冷却技术的蓝光系统是严重的工程实力对中国工程师的迹象。

“这是地面的设计,”瓦拉赫告诉HPCwire。 “他们拥有的技术,这是关键。”

更重要的是,他认为技术可以更容易规模比今天在高性能计算提供的主流产品。特别是,如果中国赶上(或外包)的晶圆厂,以更先进的技术,神威处理器可能是相当艰巨。据他介绍,65纳米芯片相比,32纳米技术将提供四倍的可用硅房地产,释放神威设计师添加更多的缓存 - 瓦拉赫相信的东西是在目前的设计弱点。

一个更明显的优势,而不是依赖于商品处理器和商业集群,中国政府似乎愿意开发专门针对以高性能的处理器和系统。日本政府这样做在一定程度上与上述ķ机和NEC向量机,但在美国和欧洲,有没有政府的直接支持,资助高性能处理器,只是头痛医头,脚痛医脚的支持来自不同机构的设计和建设先进超级计算机系统。

从这个意义上说,中国可以利用其相当的财力,逃脱的竞争,如果他们选择这样做。如果神威新的处理器和蓝光系统是一个系统的战略的迹象,那么,中国已经作出这样的选择。
原文地址
hpcwire.**去掉这些内容**com/hpcwire/2011-11-01/china_s_indigenous_supercomputing_strategy_bears_first_fruit.html
怎么缓存那么悲剧!
软件方面还不行,就这个缓存不说16核,32核追人家也困难
神威的工程师说过的要做64核、256核的cpu
若从65纳米工艺改为32纳米工艺,在缓存不变的情况下,可以做到64核,中芯国际可能要到明后年才能达到如此的工艺,中芯目前好的工艺仅有45纳米
工艺不行,缓存大不了
按照-百万浮点/瓦-排名
只有排名速度14的神威1600用的是中国的CPU,比速度第一的日本“京”稍低
其他中国的超级计算机都是用的外国芯

速度排名        平均速度        Effeciency耗电量        百万浮点/瓦 Country        Manufacturer
                          (%)       
64        172494        82.25        85.12        2026         United States        IBM
65        172494        82.25        85.12        2026         United States        IBM
29        339834        81.02        170.25        1996         United States        IBM
17        677104        80.72        340.5        1989         United States        IBM
284        65347        62.32        38.67        1690         United States        IBM
114        103200        56.43        81.5        1266         Spain                     Bull SA
21        496500        49.03        540        919         China                     IPE, Nvidia, Tyan
111        106300        54.84        117.91        902         United States        Hewlett-Packard
82        142700        48.67        160        892         Italy                     IBM
256        68005        48.15        76.25        892         Germany               IBM
61        176700        85.5        198.72        889         United States        Appro International
134        93647        38.24        108.15        866         United States        IBM
135        93647        38.24        108.15        866         United States        IBM
155        85843        38.24        99.14        866         United States        IBM
156        85843        38.24        99.14        866         United States        IBM
157        85843        38.24        99.14        866         United States        IBM
158        85843        38.24        99.14        866         United States        IBM
159        85843        38.24        99.14        866         United States        IBM
160        85843        38.24        99.14        866         United States        IBM
161        85843        38.24        99.14        866         United States        IBM
162        85843        38.24        99.14        866         United States        IBM
163        85843        38.24        99.14        866         United States        IBM
164        85843        38.24        99.14        866         United States        IBM
165        85843        38.24        99.14        866         United States        IBM
166        85843        38.24        99.14        866         United States        IBM
48        218100        83.17        252.16        865         United States        Appro International
5        1192000        52.11        1398.61        852         Japan                     NEC/HP
45        236100        80.62        281.6        838         United States        Appro International
15        773700        80.5        924.16        837         United States        Appro International
1        10510000 93.17        12659        830         Japan                     Fujitsu
234        70430        47.48        91.98        766         Taiwan                     Asus
14        795900        74.37        1074        741         China                     National Research Center of Parallel Computer Engineering & Technology
235        70280        43.77        96.03        732         United States        Hewlett-Packard
33        299300        58.86        416.78        718         Germany            Clustervision/Supermicro
212        75295        52.54        105.94        711         Australia                     Xenon Systems
16        771700        57.47        1155.07        668         China                     NUDT
2        2566000        54.58        4040        635         China                     NUDT
104        109200        91.91        176.4        619         United States        Intel
20        565700        79.01        972        582         United States        Cray Inc.
221        73350        80.39        129.6        566         Japan                     Hitachi
4        1271000        42.59        2580        493         China                     Dawning
354        58310        73.76        120.56        484         United States        IBM
228        72030        69.6        154        468         Spain                     IBM
89        126500        78.16        276        458         United States        IBM
10        1042000        75.74        2345        444         United States        IBM
46        230600        67.64        540.4        427         United States        Appro International
140        89670        88.25        212.62        422         United States        IBM
331        60729        88.25        144        422         United States        IBM
233        70760        81.9        172.38        410         United States        IBM
296        64327        81.88        156.71        410         Poland                     IBM
483        51462        81.88        125.37        410         United Kingdom        IBM
117        102044        81.18        250.74        407         United Kingdom        IBM
83        136300        88.96        337        404         Canada                     IBM
365        56790.8        88.25        141        403         United States        IBM


这资讯不错。{:soso_e179:}

这资讯不错。{:soso_e179:}