ISCA‘10

来源：百度文库编辑：超级军网时间：2024/04/24 16:01:34

http://dl.acm.org/citation.cfm?doid=1815961.1815985
Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log size, as well as low design cost, to be feasible for adopting by industrial processors. To achieve the goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information [6]. According to the experimental results on Godson-3, the overall log size of LReplay is about 0.55B/K-Inst (byte per k-instruction) for sequential consistency, and 0.85B/K-Inst for Godson-3 consistency. The log size is smaller in an order of magnitude than state-of-art deterministic replay schemes incuring no performance loss. Furthermore, LReplay only consumes about $1.3%$ area of Godson-3, since it requires only trivial modifications to the existing components of Godson-3. The above features of LReplay demonstrate the potential of integrating hardware-assisted deterministic replay into future industrial processors.http://dl.acm.org/citation.cfm?doid=1815961.1815985
Debugging parallel program is a well-known difficult problem. A promising method to facilitate debugging parallel program is using hardware support to achieve deterministic replay. A hardware-assisted deterministic replay scheme should have a small log size, as well as low design cost, to be feasible for adopting by industrial processors. To achieve the goals, we propose a novel and succinct hardware-assisted deterministic replay scheme named LReplay. The key innovation of LReplay is that instead of recording the logical time orders between instructions or instruction blocks as previous investigations, LReplay is built upon recording the pending period information [6]. According to the experimental results on Godson-3, the overall log size of LReplay is about 0.55B/K-Inst (byte per k-instruction) for sequential consistency, and 0.85B/K-Inst for Godson-3 consistency. The log size is smaller in an order of magnitude than state-of-art deterministic replay schemes incuring no performance loss. Furthermore, LReplay only consumes about $1.3%$ area of Godson-3, since it requires only trivial modifications to the existing components of Godson-3. The above features of LReplay demonstrate the potential of integrating hardware-assisted deterministic replay into future industrial processors.

http://blog.csdn.net/wahaha_nescafe/article/details/8506884
我是学体系结构学微处理器做芯片设计的，所以我想把我的一些想法和思考记录下来。

　　说到体系结构和微处理器以及芯片设计，我觉得国内最该写东西的人是中科院计算所和国防科大的人，计算所做的龙芯是很不错的，最近的ISSCC也发了论文；国防科大的东西也很好的，用起来也还不错，人家做的FT和天河也算出了名了；当然，清华和北大也很不错；复旦在频率合成方面好像一直做的也不错；浙大在系统方面也不错。我写的，仅仅代表我的想法，不代表其他任何官方看法，所以大家在阅读的时候不要骂我就是了。

　　说到芯片设计，最近我看了2012“中国芯”的评选，这个评选，应该是代表了国内最高水平、最具市场运作能力的芯片了，可是，看了之后，却是让我对国内的芯片设计不再有任何信心了，尤其是商业化芯片。

　　作为芯片开发行业中的人，对于“中国芯”中的不少芯片，也算是了解，看看都选的是什么芯片，只能是一声叹息，“哎……”。看来，国内的芯片开发，做些有政策保护性的芯片还行，其他的，和老外根本比不了了。那些社保卡芯片、身份证芯片、银行芯片等，就做做吧；那些处理器、通信芯片、模拟芯片，就算了吧，能怎样呢？技术积累不够……工艺积累不够……研究人员也沉不下来，公司也没有太多的资金坚持下拉……

　　我不知道国内的芯片设计就此陨落呢，还是有机会赶上？板着手指头算算：

　　——PC处理器芯片算不可能了，人家Intel和AMD做的火热；国内做的SPARC或MIPS，只能是专用行业市场，估计进不了我们消费者手里了，前阵子看到龙芯梦兰做的类Apple机，应该也只会卖给行业客户；
　　——GPU呢，不行，还是在老美手里；
　　——移动处理器芯片，估计也不可能了，人家Qualcomm做的火热，国内呢，湾湾的算上，也还有那么一两颗芯片，不过大家也都知道，移动处理器芯片其实是ARM的天下以及Imaginary的天下，没国内研究产权的份；福州的，应该是授权ARM，找到Synopsys做的吧；
　　——汽车电子芯片，好像也不行啊，毕竟汽车要用几十年的，汽车电子芯片是要保15年的，国内估计还没有芯片厂家能活15年的；那是英飞凌的天下；
　　——模拟的呢？国内好像几十家电源芯片公司，可这都是拼成本啊，在技术上，那是望TI和Linter的背都望不到啊；

　　那国内还有什么芯片呢？
　　不得不再说说龙芯：这是一颗技术和培养人才方面成功的芯片，但不是一颗商业化成功的芯片，或者说是至今也很难说商业化成功的芯片；但我看好龙芯在嵌入式领域的应用；龙芯的兄弟们应该放弃在消费电子领域的计划，不要做什么笔记本了，那是记者们宣传的啊！其实龙芯慢慢做，培养国内的人才挺好的。估计就是那些无良的记者们瞎写的！前阵子还看到：Intel走开，龙芯来了的文章！哎……不过再想想，龙芯培养的人，好像最好还是去了老外的公司，帮老外打工，真郁闷。

在做芯片设计，也开发了几款芯片，对于芯片的开发，也有一点发言权。

　做芯片，目前基本上就是用Verilog或者是VHDL写代码了，所以做芯片的人也是码农！在学习V/V的时候，很多人应该就是看夏宇闻的书了，不过我到认为，学习的时候，以最快的速度把整个语法过一遍，其实这个和C很像的，然后，参考LEON的代码开发模式！

　说说LEON这个东西，给了国内很大的“创新”啊。据我了解，国内有企业靠这个，说是开发了自主知识产权的芯片，然后上市了；国内也有研究院所开发了具有自主知识产权的芯片，说是用到了一些核心部门；也有研究机构，利用LEON，开发了很多东西。我不得不说，LEON给中国国内的CPU设计提供了太多的素材了！不得不佩服LEON的人，他们愿意把这个开源，并且愿意让国内的人去用。
　那么LEON的代码编写模式，为什么是值得我们学习的呢？
——从实践看来，这是出BUG机会最少的模式；
——从现有芯片开发看，我们不需要纠结于一种代码模式的综合效果会比另一种代码写作模式少两个与非门之类的，因为：现在一颗芯片，多那么点与非门不会影响太多；现在很多芯片里面的RAM反而是更占面积的；其次，现在的EDA工具已经足够智慧了，这些EDA工具能够帮我们完成太多东西了；
——并且，LEON的代码模式也是最有利于团队开发的，它可以使得团队中所有成员的开发都有统一的界面，非常有利于大规模开发；
——再次，LEON的开发，应该说是从C的模式上升到了C++的对象模式，对于大系统开发，C++的对象自然更易于理解；

上一帖这里有完整内容，太长，看起来比较悲剧。不知道龙芯培养的人才去那里了？

http://bbs.sciencenet.cn/thread-1127064-2-1.html

http://blog.csdn.net/wahaha_nescafe/article/details/8506884
我是学体系结构学微处理器做芯片设计的，所 ...
我是做模电射频方向的，国内射频方向展讯和RDA发展很快

Gunslinger 发表于 2013-4-6 22:15
我是做模电射频方向的，国内射频方向展讯和RDA发展很快
帖子的作者可能不在国内，对国内估计不是十分了解。但是CPU的设计部分见解还是砖家水平

CPU设计国内力量分散，这个比较悲剧

ISCA‘10 6月25号，龙芯CPU设计者胡伟武博士在ISCA上的报告概观 ! ... 10 10？ 10 10 10全10美！！ 10全10美 10全10美 10×10轮式装甲车 10号 wz-10