开放世界一流高校的实验室项目,让任何有志于科研和深造的年轻人,在任何时间,任何地点,都能和大牛们一起改变世界,这听起来是不是很酷?但其实这正是清帆正在做的事情!

斯坦福大学电子工程系发布的清帆项目【Optimal Estimation of the Differential Entropy】近日刚刚顺利结束,来自南京大学的njukym小组凭借出色的作品成功引起了项目方大牛的注意,接下来他们将被正式邀请加入斯坦福实验室的项目研究和论文筹备工作。究竟他们是怎么脱颖而出的呢?和小帆一起来听听njukym小组组长的回答吧~


——当时为什么会选择加入这个难度这么高的项目呢?

我给自己定的职业发展目标是希望能本科结束继续出国深造,多学一些机器学习、统计和数据分析的知识,之后能进入一些相关工作或项目里比较核心的岗位。所以一直想通过一些活动和比赛,丰富一下自己的经历。

机器学习专业其实需要很宽广的知识背景,参加完这个项目以后,也让我在信息论、统计和算法实验方面有了一定的经验,对我的专业学习很有帮助。此外,清帆项目的任务量划分的比较合理,通过完成几周的任务,能让自己循序渐进地入门这个领域,并且每周都能获得新的思考。



Week 1(项目任务):Understand how one could simulate samples from a specific density f. Compute the differential entropy of this density, and implement a sampler that can produce i.i.d. samples from this density.(以下为作品截图)





Week 2:Use the JVHW Shannon entropy estimator code on

https://github.com/EEthinker/JVHW_Entropy_Estimators to implement this differential entropy estimator. Try to tune the parameter h and plot the root mean squared error for sample sizes n = 102, 103, 104, 105 for the density constructed in Week 1. Report your findings. What h works the best for this density?(以下为作品截图)




——有什么好用的软件或者干货可以推荐给大家呢?

软件的话,做数学建模常用MATLAB,同时也是画各种专业图表的神器;写程序脚本可以用Python或者R语言,这两个都是很常用的,有很多方便的库可以用;

我其实着重推荐Python,因为python的joblib包本身就提供了一些简单的分布式计算功能,能帮助我们提高运算效率,以应对较大的实验量;Mathematica是做符号计算的,在最后一周求复杂分布的cdf的反函数方面有一些帮助;

报告写作方面,推荐LaTeX,所有图表和公式排版都非常漂亮,非常适合有排版强迫症的人(比如我)… 专业知识方面,我们主要还是用平时课堂上学的知识...

如果学校里能选到算法、机器学习、统计一类的课的话,我还是建议选这些课去系统地学习一下的... 我其实不是很推荐看网上的博客,博客文章比较适合入门和大致了解一些领域的情况,不适合深入理解里面的模型和方法。

对于一些生疏的概念,Wikipedia是一个很好的参考资料,一般偏学术的东西都会把思路和原理讲得很清楚…



Week 3:There exist other approaches that directly aim at estimating the differential entropy. One of the most celebrated methods is the so-called nearest neighbor methods. Use the code in

https://github.com/liverlover/lnn/blob/master/readme.pdf and compare its performance with the approach in Week 2. Which one works better for the specific density we constructed in Week 1?(以下为作品截图)





Week 4:Now, construct a smoother density on [0, 1] compared to the one in Week 1, and repeat the experiments in Week 2 and Week 3. Which method now works better? Has anything changed?(以下为作品截图)





教授评价


The project is really impressive! Here are some detailed comments for weekly reports.

Week 1: The computation is very careful and the plot is beautiful.

Week 2: the analysis is very nice. It is a good observation that reducing to discrete entropy is basically doing approximation of the density using piecewise constant functions....

Week 3: the plot is very interesting! Could you please do more analysis on why the local method is not able to really correct the boundary bias?

Week 4: the selection of densities look great! I am not sure if selecting h in this way works, because in practice you never know the true density, which makes it impossible to compute h(f) exactly...


——想了解更详细的作品内容和教授评价?

关注“清帆远航”微信公众号,回复关键字“斯坦福”即可获取喔~