<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>My programming and machine learning blog</title>
	<atom:link href="http://blog.vene.ro/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.vene.ro</link>
	<description>scikit-learn contributor</description>
	<lastBuildDate>Mon, 22 Apr 2013 08:45:17 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>BibTeX-powered publications list for Pelican with pelican-bibtex</title>
		<link>http://blog.vene.ro/2013/04/22/bibtex-powered-publications-list-for-pelican-with-pelican-bibtex/</link>
		<comments>http://blog.vene.ro/2013/04/22/bibtex-powered-publications-list-for-pelican-with-pelican-bibtex/#comments</comments>
		<pubDate>Mon, 22 Apr 2013 08:45:17 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bibtex]]></category>
		<category><![CDATA[blog]]></category>
		<category><![CDATA[citations]]></category>
		<category><![CDATA[pelican]]></category>
		<category><![CDATA[publications]]></category>
		<category><![CDATA[pybtex]]></category>
		<category><![CDATA[references]]></category>
		<category><![CDATA[static blog]]></category>
		<category><![CDATA[static website]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=613</guid>
		<description><![CDATA[Hook Wouldn&#8217;t you like to manage your academic publications list easily within the context of your static website? Without resorting to external services, or to software like bibtex2html, which is very nice but will then require restyling to fit your templates? Look no more, with the help of pelican-bibtex you can now manage your papers...  <a href="http://blog.vene.ro/2013/04/22/bibtex-powered-publications-list-for-pelican-with-pelican-bibtex/" title="Read BibTeX-powered publications list for Pelican with pelican-bibtex">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<h2> Hook </h2>
<p>Wouldn&#8217;t you like to manage your academic publications list easily within the context of your static website? Without resorting to external services, or to software like <em>bibtex2html</em>, which is very nice but will then require restyling to fit your templates?</p>
<p>Look no more, with the help of <a href="https://github.com/vene/pelican-bibtex">pelican-bibtex</a> you can now manage your papers from within Pelican!</p>
<h2> Backstory </h2>
<p>At <a href="http://fseoane.net">Fabian</a>&#8216;s advice, I started playing around with <a href="http://getpelican.com">Pelican</a>, a static website/blog generator for Python.  I like it better than the other generators I used before, so I chose it the next time I had to set up a website.  I still didn&#8217;t make the courage to migrate my current website and blog to it, but I promise I will.</p>
<p>Pelican has a public plugins repository, but they have a license constraint for all contributions.  My plugin isn&#8217;t complicated, but I had to &#8220;reverse engineer&#8221; undocumented parts of the <a href="http://pybtex.sourceforge.net">pybtex</a> API. I think that maybe that code that I used to render citations programatically can be useful to others, so I don&#8217;t want to release it under a restrictive license.  For this reason, I publish <a href="https://github.com/vene/pelican-bibtex">pelican-bibtex</a> in my personal GitHub account.</p>
<p>You can see it in action in the <a href="https://github.com/nlp-unibuc/nlp-unibuc-website/">source code</a> for the website I am working on at the moment, the home page of my research group.  Example output generated using pelican-bibtex can be seen <a href="http://nlp-unibuc.github.io/publications.html">here</a>.</p>
<h2> Possible extensions </h2>
<p>I have not dug in too deeply but I believe this plugin can be extended, with not much difficulty, to support referencing in Pelican blogs, and render BibTeX references at the end of every post.  This idea was suggested by Avaris on #pelican, and I find it very cool.  Since I don&#8217;t need this feature at the moment, it&#8217;s not a priority, but it&#8217;s something that I would like to see at some point.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2013/04/22/bibtex-powered-publications-list-for-pelican-with-pelican-bibtex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Really the most common english idioms?</title>
		<link>http://blog.vene.ro/2013/02/11/really-most-common-english-idioms/</link>
		<comments>http://blog.vene.ro/2013/02/11/really-most-common-english-idioms/#comments</comments>
		<pubDate>Mon, 11 Feb 2013 14:50:18 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[corpus linguistics]]></category>
		<category><![CDATA[nlp]]></category>
		<category><![CDATA[bnc]]></category>
		<category><![CDATA[british national corpus]]></category>
		<category><![CDATA[corpus]]></category>
		<category><![CDATA[fixed expression]]></category>
		<category><![CDATA[fixed phrase]]></category>
		<category><![CDATA[idioms]]></category>
		<category><![CDATA[oec]]></category>
		<category><![CDATA[oxford english corpus]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=578</guid>
		<description><![CDATA[A while back I ran into this blog post and it made me wonder. I&#8217;m not a native speaker but the idiomatic phrases that they note as common don&#8217;t strike me as such. I don&#8217;t think I have ever encountered them very often in real dialogue. The blog post lists the 10 most common idioms...  <a href="http://blog.vene.ro/2013/02/11/really-most-common-english-idioms/" title="Read Really the most common english idioms?">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>A while back I ran into <a href="http://voxy.com/blog/index.php/2012/02/top-10-most-common-idioms-in-english/">this blog post</a> and it made me wonder. I&#8217;m not a native speaker but the idiomatic phrases that they note as common don&#8217;t strike me as such. I don&#8217;t think I have ever encountered them very often in real dialogue.</p>
<p>The blog post lists the 10 most common idioms in English. <strong>Idioms</strong>, also known less ambiguously as <strong>fixed expressions</strong>, are units of language that span at least two words. Their meaning, relatively to the individual meaning of the parts of the phrase, are figurative. Despite this, fixed expressions don&#8217;t classify as creative language, or exploitations. By definition most speakers will unequivocally be familiar with them. </p>
<p>For example, they cite <em>piece of cake</em> as the most common idiomatic expression. This refers to using the phrase to mean that something is easy, that it isn&#8217;t challenging. An example of literal use, however, would be when ordering <em>a piece of cake</em> for desert in a restaurant.</p>
<p>Everyone knows that language is a perpetually changing thing, so to begin with it&#8217;s even slightly misleading to discuss of the commonness of a phrase, without giving more context.  The blog post doesn&#8217;t justify the ranking with any numbers anyway, so let&#8217;s take them one by one and find out how common they really are!</p>
<h2> Corpus Linguistics </h2>
<p>The approach we are taking here is known as corpus linguistics. The best way to argue that a certain phrase is common, that something is used with a specific meaning or that some constructions are normal is, under corpus linguistics, not to make up examples that seem reasonable, but to look at <strong>representative collections of text</strong> (corpora) and trying to find the examples there. The conclusions you get this way are backed by real-world language use.</p>
<p>An argument often brought against generative linguistics is that it focuses on the (hard) border between grammatical and not grammatical, and the border is usually defined by made-up examples. This is inappropriate for studying how the norms are exploited in real language use, for example. I refer the interested to the work of <a href="http://www.patrickhanks.com/">Patrick Hanks</a> [<a href="#f1">1</a>, <a href="#f2">2</a>].</p>
<p>Corpus linguistics is sensitive to the corpus used. For this example let&#8217;s use two British English corpora: the <a href="http://www.natcorp.ox.ac.uk/">British National Corpus</a> and the <a href="http://oxforddictionaries.com/words/the-oxford-english-corpus">Oxford English Corpus</a>. Measuring by number of words, the latter is around 20 times bigger. The strong point of the BNC is the attention given to the mixing proportions of various domains. The OEC, on the other hand, is larger and more recent. I have a feeling (but I cannot strongly affirm) that the differences in the following results arise from the inclusion in the OEC of blogs dating from the mid-2000s.</p>
<h2> Cognitive salience vs. social salience </h2>
<p>One of the key ideas that motivate corpus approaches is the mismatch between these. The cognitive salience of something is the ease with which we can recall it. An example often used in language is the fixed expression <em>kicking the bucket</em>. It is one of the standard examples of fixed expressions that people give very often when asked. It is supposed to mean <em>dying</em>.</p>
<p>However, big surprise: the BNC has only 18 instances of this phrase, out of which only 3 are idiomatic, the rest being either literal or metalinguistic. This is a nice example of the salience contrast, but we mustn&#8217;t hurry to conclusions. The OEC has 193 examples (still few, relative to its size) but a lot more of them are idiomatic uses. To save the time I didn&#8217;t look at all the examples, but took a random sample of size 18, to compare the relative frequencies to BNC. Here, 15 out of 18 instances are idiomatic and none are meta. Quite a difference!</p>
<p>This goes to show the importance of context when we draw conclusions about language use. Now let&#8217;s tackle the  list with a similar analysis.</p>
<h2>The idioms</h2>
<ol>
<li><strong>Piece of cake</strong>
<p>In BNC, this phrase occurs 51 times. 29 of these occurrences, however, the meaning is literal. In OEC we find 601 occurrences. In a random sample of size 51 we find 12 literal uses.</p>
</li>
<li><strong>Costing an arm and a leg</strong>
<p>For flexibility we search for the phrase <em>an arm and a leg</em>. In BNC it can be found 29 times: one literal, four with the verb <em>to pay</em>, and 16 with <em>to cost</em>. In OEC it appears 228 times. We take, again, a sample of size 29 and find no literal uses, 16 with <em>to cost</em>, four with <em>to pay</em>, three with <em>to charge</em> and a few different uses. The figurative meaning is the same in all cases: a lot of money.</p>
</li>
<li><strong>Break a leg</strong>
<p>BNC: 16, 13 of which are literal. OEC: 70 hits, 10/16 literal.</li>
<li><strong>Hitting the books</strong>
<p>BNC: 1 occurrence of <em>hit the record books</em>, which has a different meaning. The idiom is never used. OEC: 135, one of which literal.</p>
</li>
<li><strong>Letting the cat out of the bag</strong>
<p> We just looked for cooccurrences of <em>cat</em> in the context of the phrase <em> out of the bag </em>.<br />
BNC: 19, out of which 3 metalinguistic/literal. OEC: 298, and out of a sample of 19, all were idiomatic.</p>
</li>
<li><strong>Hitting the nail on the head</strong>
<p>BNC: 12 instances, all idiomatic. OEC: 484, and out of a sample of 12 all were idiomatic.</p>
</li>
<li><strong>When pigs fly</strong>
<p>
We looked for the lemma <em>fly</em> before the word <em>pigs</em> therefore catching multiple variations.<br />
BNC: 17 hits, OEC: 240. </li>
</p>
<li><strong>Judging a book by its cover</strong>
<p> We looked for the fixed phrase <em>book by its cover</em>, because the leading verb might vary.<br />
In the BNC, 11 instances (1 of them with tell instead of judge). In OEC, 195 instances. Sampling 11, all were idiomatic.</p>
</li>
<li><strong>Biting off more than one can chew</strong>
<p>
BNC: 16 occurences, one of which with &#8220;to take&#8221; instead of &#8220;to bite&#8221;. OEC: 231, all idiomatic after sampling 16.
</p>
</li>
<li><strong>Scratching one&#8217;s back</strong>
<p>BNC: 23, out of which only 5 idiomatic. OEC: 756, 5/23 idiomatic.</p>
</li>
</ol>
<h2> Recalculating the rank </h2>
<p>We now have enough data to reorder the expressions and compare. The result will be more approximate for the OEC because of our use of small subsamples to estimate the frequencies, but hopefully it will still be interesting. The way we are estimating the counts for the OEC is as follows: take, for instance, <em>break a leg</em>. It was found 70 times, and out of a sample of 16, 10 were literal. The expected number of idiomatic uses is therefore:<br />
<center>\(n = \left ( 1 &#8211; \frac{10}{16} \right ) \cdot 70 = 26.25\)</center><br />
Repeating this computation and skipping a ton of steps leads to the following rankings:</p>
<div style="float: left; margin-left: 5em;">
<strong>In the British National Corpus:</strong></p>
<ol>
<li>Costing an arm and a leg</li>
<li>Piece of cake</li>
<li>When pigs fly</li>
<li>Letting the cat out of the bag</li>
<li>Biting off more than one can chew</li>
<li>Hitting the nail on the head</li>
<li>Judging a book by its cover</li>
<li>Scratching one’s back</li>
<li>Break a leg</li>
<li>Hitting the books</li>
</ol>
</div>
<div style="float: right; margin-right: 5em;">
<strong>In the Oxford English Corpus:</strong></p>
<ol>
<li>Hitting the nail on the head</li>
<li>Piece of cake</li>
<li>Letting the cat out of the bag</li>
<li>When pigs fly</li>
<li>Biting off more than one can chew</li>
<li>Costing an arm and a leg</li>
<li>Judging a book by its cover</li>
<li>Scratching one’s back</li>
<li>Hitting the books</li>
<li>Break a leg</li>
</ol>
</div>
<p><br style="clear: both;"/><br />
We can see that apart from the apparent switching of <em>hitting the nail on the head</em> with <em>costing an arm and a leg</em>, the rankings are not too different. We can quantify this by using the <strong>Rank Distance</strong>, a metric introduced by Liviu P. Dinu [<a href="#f3">3</a>, <a href="#f4">4</a>]. Here, all our 3 rankings are over the same domain: we are not looking for the most frequent idioms in the corpora, this would be very hard. We are just reordering the proposed rank according to the occurrences in BNC and OEC. In this simple case, Rank Distance reduces to \(\ell_1\) distance over rank position vectors. The weighted Rank Distance, bounded on \([0, 1]\) is in this case given by a scaling factor of \(0.5k^2\) where <em>k</em> is the length of the rankings (10 in our case).</p>
<p>The computed distance between the original ranking and the BNC reordering is 0.52. Between the original and the OEC reordering, it is 0.68. Our two reorderings are much closer: the distance is 0.28. This is mostly because that the permutations between the two reorderings affect the top position, and are therefore weighted more.</p>
<p>It&#8217;s also interesting to look at the ratio of the counts. Interestingly, they approximately differ by a constant factor not far from the relative size difference of the two corpora, as would be expected.</p>
<p>We have to throw away <em>hitting the books</em> because its BNC zero count leads to divisions by zero. After this step, the average of the relative counts of the idioms is 19.5, with a standard deviation of 10.1, while OED is supposed to have around 20 times more words than the BNC.</p>
<h2>Conclusions</h2>
<p>Well, it seems people don&#8217;t say <em>break a leg</em> and <em>let&#8217;s hit the books</em> as often as the original author claims. The popularity of most of the cited idioms seems supported by the data, but we have no easy way to find other idioms that might turn out to be much more frequent. Corpus linguistics is a reliable way to measure the social salience of language patterns It should always be used to verify and back empty claims of the form <em>X is correct</em>, <em>Y is frequent</em> or <em>Nobody says Z</em>. </p>
<p>[<span id="f1">1</span>] Patrick Hanks, <a href="http://www.patrickhanks.com/uploads/5/1/4/9/5149363/howpeopleusewordstomakemeanings.pdf"> How people use words to make meanings</a>.<br />
[<span id="f2">2</span>] Patrick Hanks, <a href="http://www.amazon.com/Lexical-Analysis-Exploitations-Patrick-Hanks/dp/0262018578">Lexical Analysis: Norms and Exploitations</a>. The MIT Press (January 25, 2013)<br />
[<span id="f3">3</span>] Liviu P. Dinu, Florin Manea. <a href="http://dl.acm.org/citation.cfm?id=1167105">An efficient approach for the rank aggregation problem</a>. In: Theoretical Computer Science, Volume 359 Issue 1, 14 August 2006. Pages 455 &#8211; 461.<br />
[<span id="f4">4</span>] Liviu P. Dinu, <a href="http://dl.acm.org/citation.cfm?id=937465">On the Classification and Aggregation of Hierarchies with Different Constitutive Elements</a>. Fundam. Inform. 55(1): 39-50 (2003)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2013/02/11/really-most-common-english-idioms/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Scikit-learn-speed: An overview on the final day</title>
		<link>http://blog.vene.ro/2012/08/20/scikit-learn-speed-an-overview-on-the-final-day/</link>
		<comments>http://blog.vene.ro/2012/08/20/scikit-learn-speed-an-overview-on-the-final-day/#comments</comments>
		<pubDate>Mon, 20 Aug 2012 00:44:07 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[gsoc]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[scikit-learn-speed]]></category>
		<category><![CDATA[speedup]]></category>
		<category><![CDATA[summary]]></category>
		<category><![CDATA[vbench]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=518</guid>
		<description><![CDATA[This summer, I was granted the project called scikit-learn-speed, consisting of developing a benchmarking platform for scikit-learn and using it to find potential speedups, and in the end, make the library go faster wherever I can. On the official closing day of this work, I&#8217;d like to take a moment and recall the accomplishments and...  <a href="http://blog.vene.ro/2012/08/20/scikit-learn-speed-an-overview-on-the-final-day/" title="Read Scikit-learn-speed: An overview on the final day">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>This summer, I was granted the project called <em>scikit-learn-speed</em>, consisting of developing a benchmarking platform for <em>scikit-learn</em> and using it to find potential speedups, and in the end, make the library go faster wherever I can.</p>
<p>On the official closing day of this work, I&#8217;d like to take a moment and recall the accomplishments and failures of this project, and all the lessons to be learned.</p>
<h2>The <em>scikit-learn-speed</em> benchmark platform</h2>
<p><a href="http://jenkins-scikit-learn.github.com/scikit-learn-speed/"><img src="http://blog.vene.ro/wp-content/uploads/2012/08/skl-speed-300x163.png" alt="" title="skl-speed" width="300" height="163" class="aligncenter size-medium wp-image-533" /></a><br />
<a href="http://jenkins-scikit-learn.github.com/scikit-learn-speed/"><em>Scikit-learn-speed</em></a> is a continuous benchmark suite for the <a href="http://scikit-learn.org"><em>scikit-learn</em></a> library. It has the following features:</p>
<ul>
<li><em>vbench</em>-powered integration with Git</li>
<li>Easily triggered build and report generation: just type <code>make</code></li>
<li>Easily readable and writeable template for benchmarks:
<pre class="brush: python; title: ; notranslate">
    {
     'obj': 'LogisticRegression',
     'init_params': {'C': 1e5},
     'datasets': ('arcene', 'madelon'),
     'statements': ('fit', 'predict')
    }, ...
</pre>
</li>
<li>Many attributes recorded: time (w/ estimated standard deviation), memory usage, cProfiler output, line_profiler output, tracebacks</li>
<li>Multi-step benchmarks: i.e. <code>fit</code> followed by <code>predict</code></li>
</ul>
<p>What were the lessons I learned here?</p>
<h3>Make your work reusable: the trade-off between good design and get-it-working-now</h3>
<p>For the task of rolling out a continuous benchmarking platform, we decided pretty early in the project to adopt Wes McKinney&#8217;s <em>vbench</em>. If my goal would&#8217;ve been to maintain <em>vbench</em> and extend it into a multi-purpose, reusable benchmarking framework, the work would&#8217;ve been structured differently. It also would have been very open-ended and difficult to quantify.</p>
<p>The way things have been, I came up with features that we need in <em>scikit-learn-speed</em>, and tried to implement them in <em>vbench</em> without refactoring too much, but still by trying to make them as reusable as possible.</p>
<p>The result? I got all the features for <em>scikit-learn-speed</em>, but the implementation is not yet clean enough to be merged into <em>vbench</em>. This is fine for a project with a tight deadline such as this one: after it&#8217;s done, I will just spend another weekend on cleaning the work up and making sure it&#8217;s appreciated upstream. This will be easier because of the constraint to keep compatibility with <em>scikit-learn-speed</em>.</p>
<h3>Never work quietly (unless you&#8217;re a ninja)</h3>
<p>I know some students who prefer that the professor doesn&#8217;t even know they exist until the final, when they would score an A, and (supposedly) leave the professor amazed. In real life, plenty of people would be interested in what you are doing, as long as they know about it. The PSF goes a long way to help this, with the &#8220;blog weekly&#8221; rule. In the end, however, it&#8217;s all up to you to make sure that everybody who should know finds out about your work. It will spare the world the duplicated work, the abandoned projects, but most importantly, those people could point you to things you have missed. Try to mingle in real-life as well, attend conferences, meetups, coding sprints.</p>
<p>I was able to slightly &#8220;join forces&#8221; with a couple of people who contacted me about my new <em>vbench</em> features (Hi Jon and Joel!), I have shaped my design slightly towards their requirements as well, and hopefully the result will be a more general <em>vbench</em>.</p>
<h2>The speedups</h2>
<p>Once <em>scikit-learn-speed</em> was up and running, I couldn&#8217;t believe how useful it is to be able to scroll, catch slow code and jump straight at the profiler output with one click. I jumped on the following speed-ups:</p>
<ul>
<li>Multiple outputs in linear models. (<a href="https://github.com/scikit-learn/scikit-learn/pull/913">PR</a>)
<p>Some of them proved trickier than expected, so I didn&#8217;t implement it for all the module yet, but it is ready for some estimators.
</li>
<li>Less callable functions passed around in <code>FastICA</code> (<a href="https://github.com/scikit-learn/scikit-learn/pull/927">merged</a>)</li>
<li>Speed up <code>euclidean_distances</code> by rewriting in Cython. (<a href="https://github.com/scikit-learn/scikit-learn/pull/1006">PR</a>)
<p>This meant making more operations support an <code>out</code> argument, for passing preallocated memory. This touches many<br />
different objects in the codebase: clustering, manifold learning, nearest neighbour methods.</li>
<li><a href="http://blog.vene.ro/2012/08/18/inverses-pseudoinverses-numerical-issues-speed-symmetry/" title="Inverses and pseudoinverses. Numerical issues, speed, symmetry.">Insight into inverse and pseudoinverse computation</a>, new <code>pinvh</code> function for inverting symmetric/hermitian matrices. (<a href="https://github.com/scikit-learn/scikit-learn/pull/1015">PR</a>)
<p>This speeds up the covariance module (especially <code>MinCovDet</code>), <code>ARDRegression</code> and the mixture models. It also lead to an <a href="https://github.com/scipy/scipy/pull/289">upstream contribution to Scipy</a></li>
<li><code>OrthogonalMatchingPursuit</code> forward stepwise path for cross-validation (<a href="https://github.com/scikit-learn/scikit-learn/pull/1042">PR</a>)
<p>This is only halfway finished, but it will lead to faster and easier optimization of the <code>OMP</code> sparsity parameter.</li>
</ul>
<p>Lessons? These will be pretty obvious.</p>
<h3>Write tests, tests, tests!</h3>
<p>This is a no-brainer, but it still didn&#8217;t stick. In that one case out of 10 that I didn&#8217;t explicitly test, a bug was obviously hiding. When you want to add a new feature, it&#8217;s best to start by writing a failing test, and then <a href="http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast">making it pass</a>. Sure, you will miss tricky bugs, but you will never have embarrassing, obvious bugs in your code <img src='http://blog.vene.ro/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>Optimization doesn&#8217;t have to be ugly</h3>
<p>Developers often shun optimization. It&#8217;s true, you should profile first, and you shouldn&#8217;t focus on speeding up stuff that is dominated by other computations that are orders of magnitude slower. However, there is an elephant in the room: the assumption that making code faster invariably makes it less clear, and takes a lot of effort.</p>
<p>The following code is a part of scipy&#8217;s <code>pinv2</code> function as it currently is written:</p>
<pre class="brush: python; title: ; notranslate">
    cutoff = cond*np.maximum.reduce(s)
    psigma = np.zeros((m, n), t)
    for i in range(len(s)):
        if s[i] &gt; cutoff:
            psigma[i,i] = 1.0/np.conjugate(s[i])
    return np.transpose(np.conjugate(np.dot(np.dot(u,psigma),vh)))
</pre>
<p><code>psigma</code> is a diagonal matrix, and some time and memory can be saved with simple vectorization. However, this part of the code dominated by an above call to <code>svd</code>. The profiler output would say that we shouldn&#8217;t bother, but is it really a bother? Look at Jake&#8217;s new version:</p>
<pre class="brush: python; title: ; notranslate">
    above_cutoff = (s &gt; cond * np.max(s))
    psigma_diag = np.zeros_like(s)
    psigma_diag[above_cutoff] = 1.0 / s[above_cutoff]
 
    return np.transpose(np.conjugate(np.dot(u * psigma_diag, vh)))
</pre>
<p>It&#8217;s shorter, more elegant, easier to read, and nevertheless faster. I would say it is worth it.</p>
<h3>Small speed-ups can propagate</h3>
<p>Sure, it&#8217;s great if you can compute an inverse two times faster, say in 0.5s instead of 1s. But if some algorithm calls this function in a loop that might iterate 100, 300, or 1000 times, this small speed-up seems much more important, doesn&#8217;t it?</p>
<p>What I&#8217;m trying to say with this is that in a well-engineered system, a performance improvement to a relatively small component (such as the function that computes a pseudoinverse) can lead to multiple spread out improvements. Be careful of the double edge of this sword, a bug introduced in a small part can cause multiple failures downstream. But you <em>are</em> fully covered by your test suite, aren&#8217;t you? </p>
<p>Overall it has been a fruitful project that may have not resulted in a large number of speed-ups, but a few considerable ones nonetheless. And I venture the claim that the <em>scikit-learn-speed</em> tool will prove useful over time, and that the efforts deployed during this project have stretched beyond the boundary of the <em>scikit-learn</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/08/20/scikit-learn-speed-an-overview-on-the-final-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Inverses and pseudoinverses. Numerical issues, speed, symmetry.</title>
		<link>http://blog.vene.ro/2012/08/18/inverses-pseudoinverses-numerical-issues-speed-symmetry/</link>
		<comments>http://blog.vene.ro/2012/08/18/inverses-pseudoinverses-numerical-issues-speed-symmetry/#comments</comments>
		<pubDate>Sat, 18 Aug 2012 17:41:04 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[inv]]></category>
		<category><![CDATA[matrix inverse]]></category>
		<category><![CDATA[numerical analysis]]></category>
		<category><![CDATA[numerical methods]]></category>
		<category><![CDATA[pinv]]></category>
		<category><![CDATA[pinvh]]></category>
		<category><![CDATA[positive semidefinite]]></category>
		<category><![CDATA[pseudoinverse]]></category>
		<category><![CDATA[symmetric]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=497</guid>
		<description><![CDATA[The matrix inverse is a cornerstone of linear algebra, taught, along with its applications, since high school. The inverse of a matrix \(A\), if it exists, is the matrix \(A^{-1}\) such that \(AA^{-1} = A^{-1}A = I_n\). Based on the requirement that the left and right multiplications should be equal, it follows that it only...  <a href="http://blog.vene.ro/2012/08/18/inverses-pseudoinverses-numerical-issues-speed-symmetry/" title="Read Inverses and pseudoinverses. Numerical issues, speed, symmetry.">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>The matrix inverse is a cornerstone of linear algebra, taught, along with its applications, since high school. The inverse of a matrix \(A\), if it exists, is the matrix \(A^{-1}\) such that \(AA^{-1} = A^{-1}A = I_n\). Based on the requirement that the left and right multiplications should be equal, it follows that it only makes sense to speak of inverting square matrices. But just the square shape is not enough: for a matrix \(A\) to have an inverse, \(A\) must be full rank.</p>
<p>The inverse provides an elegant (on paper) method of finding solutions to systems of \(n\) equations with \(n\) unknowns, which correspond to solving \(Ax = b\) for \(x\). If we&#8217;re lucky and \(A^{-1}\) exists, then we can find \(x = A^{-1}b\). For this to work, it must be the case that:</p>
<ul>
<li>We have exactly as many unknowns as equations</li>
<li>No equation is redundant, i.e. can be expressed as a linear combination of the others</li>
</ul>
<p>In this setting, there is a unique solution for \(x\).</p>
<h2>The Moore-Penrose pseudoinverse</h2>
<p>What if we have more equations than unknowns? It is most likely the case that we cannot satisfy all the equations perfectly, so let&#8217;s settle for a solution that best fits the constraints, in the sense of minimising the sum of squared errors. We solve \(\operatorname{arg\,min}_x ||b &#8211; Ax||\).</p>
<p>And how about the other extreme, where we have a lot of unknowns, but just a few equations constraining them. We will probably have an infinity of solutions, how can we choose one? A popular choice is to take the one of least \(\ell_2\) norm: \(\operatorname{arg\,min}_x ||x|| \operatorname{s.t.} Ax = b\). Is there a way to generalize the idea of a matrix inverse for this setting?</p>
<p>The pseudoinverse of an arbitrary-shaped matrix \(A\), written \(A^{+}\), has the same shape as \(A^{T}\) and solves our problem: the answer to both optimization methods above is given by \(x = A^{+}y\).</p>
<p>The theoretical definition of the pseudoinverse is given by the following conditions. The intuitive way to read them is as properties of \(AA^+\) or \(A^+A\):</p>
<ul>
<li>\(AA^+A = A\)</li>
<li>\(A^+AA^+ = A^+\)</li>
<li>\((AA^+)^T = AA^+\)</li>
<li>\((A^+A)^T = A^+A\)</li>
</ul>
<p>These conditions do not however give us a way to get our hands on a pseudoinverse, so we need something else.</p>
<h2>How to compute the pseudoinverse on paper</h2>
<p>The first time I ran into the pseudoinverse, I didn&#8217;t even know its definition, only the expression of the closed-form solution of such a problem, and given as:</p>
<p>\(A^+ = (A^T A)^{-1}A^T\)</p>
<p>What can we see from this expression:</p>
<ul>
<li>It gives us a way to compute the pseudoinverse, and hence to solve the problem</li>
<li>If \(A\) is actually invertible, it means \(A^T\) is invertible, so we have \(A^+ = A^{-1}(A^T)^{-1}A^T = A^{-1}\)</li>
<li>Something bad happens if \(A^TA\) is not invertible.</li>
</ul>
<p>The pseudoinverse is still defined, and unique, when \(A^TA\) is not invertible, but we cannot use the expression above to compute it.</p>
<h2>Numerical issues</h2>
<p>Before going on, we should clarify and demystify some of the urban legends about numerical computation of least squares problems. You might have heard the following unwritten rules: </p>
<ol>
<li>Never compute \(A^{-1}\), solve the system directly</li>
<li>If you really need \(A^{-1}\), use <code>pinv</code> and not <code>inv</code></li>
</ol>
<p>The first of these rules is based on some misguided beliefs, but is still good advice. If your goal is a one-shot answer to a system, there&#8217;s no use in explicitly computing a possibly large inverse, when all you need is \(x\). But <a href="http://arxiv.org/abs/1201.6035">this paper</a> shows that computing the inverse is not necessarily a bad thing. The key to this is conditional accuracy, and as long as the <code>inv</code> function used has good conditional bounds, you will get as good results as with a least squares solver.</p>
<p>The second rule comes from numerical stability, and will definitely bite you if misunderstood. If \(A\) is a square matrix with a row full of zeros, it&#8217;s clearly not invertible, so an algorithm attempting to compute the inverse will fail and you will be able to catch that failure. But what if the row is not exactly zero, but the sum of several other rows, and a slight loss of precision is propagated at every step?</p>
<h2>Numerical rank vs. actual rank</h2>
<p>The rank of a matrix \(A\) is defined as the number of linearly independent rows (or equivalently, columns) in \(A\). In other words, the number of non-redundant equations in the system. We&#8217;ve seen before that if the rank is less than the total number of rows, the system cannot have a unique solution anymore, so the matrix \(A\) is not invertible.</p>
<p>The rank of a matrix is a computationally tricky problem. On paper, with small matrices, you would look at minors of decreasing size, until you find the first non-zero one. This is unfeasible to implement on a computer, so numerical analysis has a different approach. Enter the singular value decomposition!</p>
<p>The SVD of a matrix \(A\) is \(A = USV^{T}\), where \(S\) is diagonal and \(U, V\) are orthogonal. The elements on the diagonal of \(S\) are called the singular values of \(A\). It can be seen that to get a row full of zeros when multiplying three such matrices, a singular value needs to be exactly zero. </p>
<p>The ugly thing that could happen is that one (or usually more) singular values are not exactly zero, but very low values, due to propagated imprecision. Why is this a problem? By looking at the SVD and noting its properties, it becomes clear that \(A^{-1} = VS^{-1}U^{T}\) and since \(S\) is diagonal, its inverse is formed by taking the inverse of all the elements on the diagonal. But if a singular value is very small but not quite zero, its inverse is very large and it will blow up the whole computation of the inverse. The right thing to do here is either to tell the user that \(A\) is numerically rank deficient, or to return a pseudoinverse instead. A pseudoinverse would mean: give up on trying to get \(AA^+\) to be the identity matrix, simply aim for a diagonal matrix with approximately ones and zeroes. In other words, when singular values are very low, set them to 0.</p>
<p>How do you set the threshold? This is actually a delicate issue, being discussed on <a href="http://thread.gmane.org/gmane.comp.python.numeric.general/50396/focus=50912">the numeric Python mailing list</a>.</p>
<h2>Scipy implementations</h2>
<p>Scipy exposes <code>inv</code>, <code>pinv</code> and <code>pinv2</code>. <code><strong>inv</strong></code> secretly invokes LAPACK, that ancient but crazy robust code that&#8217;s been used since the 70s, to first compute a pivoted LU decomposition that is then used to compute the inverse. <code><strong>pinv</strong></code> also uses LAPACK, but for computing the least-squares solution to the system \(AX = I\). <code><strong>pinv2</strong></code> computes the SVD and transposes everything like shown above. Both <code><strong>pinv</strong></code> and <code><strong>pinv2</strong></code> expose <code>cond</code> and <code>rcond</code> arguments to handle the treatment of very small singular values, but (<em>attention!</em>) they behave differently!</p>
<p>The different implementations also lead to different speed. Let&#8217;s look at inverting a random square matrix:</p>
<pre class="brush: python; title: ; notranslate">
In [1]: import numpy as np

In [2]: from scipy import linalg

In [3]: a = np.random.randn(1000, 1000)

In [4]: timeit linalg.inv(a)
10 loops, best of 3: 132 ms per loop

In [5]: timeit linalg.pinv(a)
1 loops, best of 3: 18.8 s per loop

In [6]: timeit linalg.pinv2(a)
1 loops, best of 3: 1.58 s per loop
</pre>
<p>Woah, huge difference! But do all three methods return the &#8220;right&#8221; result?</p>
<pre class="brush: python; title: ; notranslate">
In [7]: linalg.inv(a)[:3, :3]
Out[7]: 
array([[ 0.03636918,  0.01641725,  0.00736503],
       [-0.04575771,  0.03578062,  0.02937733],
       [ 0.00542367,  0.01246306,  0.0122156 ]])

In [8]: linalg.pinv(a)[:3, :3]
Out[8]: 
array([[ 0.03636918,  0.01641725,  0.00736503],
       [-0.04575771,  0.03578062,  0.02937733],
       [ 0.00542367,  0.01246306,  0.0122156 ]])

In [9]: linalg.pinv2(a)[:3, :3]
Out[9]: 
array([[ 0.03636918,  0.01641725,  0.00736503],
       [-0.04575771,  0.03578062,  0.02937733],
       [ 0.00542367,  0.01246306,  0.0122156 ]])

In [10]: np.testing.assert_array_almost_equal(linalg.inv(a), linalg.pinv(a))

In [11]: np.testing.assert_array_almost_equal(linalg.inv(a), linalg.pinv2(a))
</pre>
<p>Looks good! This is because we got lucky, though, and <code>a</code> was invertible to start with. Let&#8217;s look at its spectrum:</p>
<pre class="brush: python; title: ; notranslate">
In [12]: _, s, _ = linalg.svd(a)

In [13]: np.min(s), np.max(s)
Out[13]: (0.029850235603382822, 62.949785645178906)
</pre>
<p>This is a lovely range for the singular values of a matrix, not too small, not too large. But what if we built the matrix in a way that would always pose problems? Specifically, let&#8217;s look at the case of covariance matrices:</p>
<pre class="brush: python; title: ; notranslate">
In [14]: a = np.random.randn(1000, 50)

In [15]: a = np.dot(a, a.T)

In [16]: _, s, _ = linalg.svd(a)

In [17]: s[-9:]
Out[17]: 
array([  7.40548924e-14,   6.48102455e-14,   5.75803505e-14,
         5.44263048e-14,   4.51528730e-14,   3.55317976e-14,
         2.46939141e-14,   1.54186776e-14,   5.08135874e-15])

</pre>
<p><code>a</code> has at least 9 tiny singular values. Actually it&#8217;s easy to see why there are 950 of them:</p>
<pre class="brush: python; title: ; notranslate">
In [18]: np.sum(s &lt; 1e-10)
Out[18]: 950
</pre>
<p>How do our functions behave in this case? Instead of just looking at a corner, let&#8217;s use our gift of sight:<a href="http://blog.vene.ro/wp-content/uploads/2012/08/pseudoinverses.png"><img src="http://blog.vene.ro/wp-content/uploads/2012/08/pseudoinverses-300x218.png" alt="" title="Pseudoinverses" width="300" height="218" class="aligncenter size-medium wp-image-508" /></a></p>
<p>The small eigenvalues are large enough that <code>inv</code> thinks the matrix is full rank. <code>pinv</code> does better but it still fails, you can see a group of high-amplitude noisy columns. <code>pinv2</code> is faster and it also gives us a useful result in this case.</p>
<p>Wait, does this mean that <code>pinv2</code> is simply better, and <code>pinv</code> is useless?</p>
<p>Not quite. Remember, we are now trying to actually invert matrices, and degrade gracefully in case of rank deficiency. But what if we need the pseudoinverse to solve an actual non-square, wide or tall system?</p>
<pre class="brush: python; title: ; notranslate">
In [19]: a = np.random.randn(1000, 50)

In [20]: timeit linalg.pinv(a)
10 loops, best of 3: 104 ms per loop

In [21]: timeit linalg.pinv(a.T)
100 loops, best of 3: 7.08 ms per loop

In [22]: timeit linalg.pinv2(a)
10 loops, best of 3: 114 ms per loop

In [23]: timeit linalg.pinv2(a.T)
10 loops, best of 3: 126 ms per loop
</pre>
<p>Huge victory for <code>pinv</code> in the wide case! Hurray! With all this insight, we can draw a line and see what we learned.</p>
<ul>
<li> If you are 100% sure that your matrix is invertible, use <code>inv</code> for a huge speed gain. The implementation of <code>inv</code> from Scipy is based on LAPACK&#8217;s <code>*getrf</code> + <code>*getri</code>, known to have good bounds.</li>
<li> If you are trying to solve a tall or wide system, use <code>pinv</code>.</li>
<li> If your matrix is square but might be rank deficient, use <code>pinv2</code> for speed and numerical gain.</li>
</ul>
<h2>Improving the symmetric case</h2>
<p>But wait a second, can&#8217;t we do better? \(AA^T\) is symmetric, can&#8217;t we make use of that to speed up the computation even more? Clearly, if \(A\) is symmetric, in its SVD \(A = USV^T\), we must have \(U = V\). But this is exactly the eigendecomposition of a symmetric matrix \(A\). The eigendecomposition can be computed cheaper than the SVD using Scipy <code>eigh</code>, that uses LAPACK&#8217;s <code>*evr</code>. As part of my GSoC this year, with help from <a href="http://jakevdp.github.com/">Jake VanderPlas</a>, we made a <a href="https://github.com/scipy/scipy/pull/289">pull request to Scipy</a> containing a <code>pinvh</code> function that is equivalent to <code>pinv2</code> but faster for symmetric matrices.</p>
<pre class="brush: python; title: ; notranslate">
In [24]: timeit linalg.pinv2(a)
1 loops, best of 3: 1.54 s per loop

In [25]: timeit linalg.pinvh(a)
1 loops, best of 3: 621 ms per loop

In [26]: np.testing.assert_array_almost_equal(linalg.pinv2(a), linalg.pinvh(a))
</pre>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/08/18/inverses-pseudoinverses-numerical-issues-speed-symmetry/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The scikit-learn-speed ship has set sail! Faster than ever, with multi-step benchmarks!</title>
		<link>http://blog.vene.ro/2012/08/11/the-scikit-learn-speed-ship-has-set-sail-faster-than-ever-with-multi-step-benchmarks/</link>
		<comments>http://blog.vene.ro/2012/08/11/the-scikit-learn-speed-ship-has-set-sail-faster-than-ever-with-multi-step-benchmarks/#comments</comments>
		<pubDate>Sat, 11 Aug 2012 15:32:26 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[multi-step]]></category>
		<category><![CDATA[multistep]]></category>
		<category><![CDATA[vbench]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=490</guid>
		<description><![CDATA[I am pleased to announce that last night at 2:03 AM, the first fully automated run of the scikit-learn-speed test suite has run on our Jenkins instance! You can admire it at its temporary home for now. As soon as we verify that everything is good, we will move this to the official scikit-learn page....  <a href="http://blog.vene.ro/2012/08/11/the-scikit-learn-speed-ship-has-set-sail-faster-than-ever-with-multi-step-benchmarks/" title="Read The scikit-learn-speed ship has set sail! Faster than ever, with multi-step benchmarks!">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>I am pleased to announce that last night at 2:03 AM, the first fully automated run of the scikit-learn-speed test suite has run on our Jenkins instance! You can admire it at <a href="http://jenkins-scikit-learn.github.com/scikit-learn-speed/">its temporary home</a> for now. As soon as we verify that everything is good, we will move this to the official scikit-learn page.</p>
<p>I would like to take this opportunity to tell you about our latest changeset. We made running the benchmark suite tons simpler by adding a friendly Makefile. You can read more about its usage in the guide. But by far, our coolest new toy is: </p>
<h2>Multi-step benchmarks</h2>
<p>A standard vbench benchmark has three units of code, represented as strings: <code>code</code>, <code>setup</code> and <code>cleanup</code>. With the original timeit-based benchmarks, this means that for every run, the setup would be executed once. Then, the main loop runs <code>repeat</code> times, and within each iteration, the <code>code</code> is run <code>ncalls</code> times. Then <code>cleanup</code> happens, the best time is returned, and everybody is happy.</p>
<p>In scikit-learn, most of our interesting objects go through a state change called <em>fitting</em>. This metaphor is right at home in the machine learning field, where we separate the learning phase for the prediction phase. The prediction step cannot be invoked on an object that hasn&#8217;t been fitted.</p>
<p> For some algorithms, one of these steps is trivial. A brute force Nearest Neighbors classifier can be instantaneously fit, but prediction takes a while. On the opposite end we have linear models, with tons of complicated algorithms to fit them, but evaluation is a simple matrix-vector product that Numpy handles perfectly.</p>
<p>But many of scikit-learn&#8217;s estimators have both steps interesting. Let&#8217;s take Non-negative Matrix Factorization. It has three interesting functions: The <code>fit</code> that computes \(X = WH \), the <code>transform</code> that computes a non-negative projection on the components learned in <code>fit</code>, and <code>fit_transform</code> that takes advantage of the observation that when fitting, we also get the transformed \(X \) for free.</p>
<p>When benchmarking NMF, we initially had to design 3 benchmarks: </p>
<ul>
<li>
<code>setup = </code>standard, <code>code = obj.fit(X)</code></li>
<li>
<code>setup = </code>standard, <code>code = obj.fit_transform(X)</code></li>
<li>
<code>setup = </code>standard<code> + obj.fit(X)</code>, <code>code = obj.transform(X)</code></li>
</ul>
<h2>How much time were we wasting?</h2>
<p>Let&#8217;s say it takes 10 seconds. For every benchmark, we time the code by running it 3 times. We run it once more to measure memory usage, once more for <code>cProfile</code> and one last time for <code>line_profiler</code>. This is a total of 6 times per benchmark. We need to multiply this by 2 again for running on two datasets. So when benchmarking <code>NMF</code>, because we need to fit before predicting, we do it 12 extra times. If a fit takes 5 seconds, this means one minute wasted on benchmarking just one estimator. <em>Wouldn&#8217;t it be nice to <code>fit</code>, <code>fit_transform</code> and <code>transform</code> in a sequence?</em></p>
<h2>Behind the scenes</h2>
<p>We made the <code>PythonBenchmark code</code> parameter also support getting a sequence of strings, instead of just a string. On the database side, every benchmark result entry gets an extra component in the primary key, the number of the step it measures.</p>
<p>In the benchmark description files, nothing is changed:</p>
<pre class="brush: python; title: ; notranslate">
{
    'obj': 'NMF',
    'init_params': {'n_components': 2},
    'datasets': ('blobs',),
    'statements': ('fit_unsup', 'transform_unsup', 'fit_transform')
},
</pre>
<p>But before, we would take the cartesian product of datasets and statements, and build a <code>Benchmark</code> object for every pairing. Now, we just pass the tuple as it is, and vbench is smart enough to do the right thing.<br />
We avoided the extra calls to <code>fit</code> in a lot of benchmarks. The whole suite now takes almost half the time to run!</p>
<p><em>Note:</em> This trick is currently hosted in the <code>abstract_multistep_benchmarks</code> vbench branch in my fork. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/08/11/the-scikit-learn-speed-ship-has-set-sail-faster-than-ever-with-multi-step-benchmarks/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Profiler output, benchmark standard deviation and other goodies in scikit-learn-speed</title>
		<link>http://blog.vene.ro/2012/07/27/profiler-output-benchmark-standard-deviation-and-other-goodies-in-scikit-learn-speed/</link>
		<comments>http://blog.vene.ro/2012/07/27/profiler-output-benchmark-standard-deviation-and-other-goodies-in-scikit-learn-speed/#comments</comments>
		<pubDate>Fri, 27 Jul 2012 09:01:48 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[gsoc]]></category>
		<category><![CDATA[memory_profiler]]></category>
		<category><![CDATA[scikit-learn-speed]]></category>
		<category><![CDATA[vbench]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=477</guid>
		<description><![CDATA[This post is about the scikit-learn benchmarking project that I am working on, called scikit-learn-speed. This is a continuous benchmarking suite that runs and generates HTML reports using Wes McKinney&#8217;s vbench framework, to which I had to make some (useful, I hope) additions. What it looks like now You can check out a teaser/demo that...  <a href="http://blog.vene.ro/2012/07/27/profiler-output-benchmark-standard-deviation-and-other-goodies-in-scikit-learn-speed/" title="Read Profiler output, benchmark standard deviation and other goodies in scikit-learn-speed">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>This post is about the <a href="http://scikit-learn.org">scikit-learn </a>benchmarking project that I am working on, called <a href="https://github.com/vene/scikit-learn-speed">scikit-learn-speed</a>. This is a continuous benchmarking suite that runs and generates HTML reports using Wes McKinney&#8217;s <a href="http://wesmckinney.com/blog/?p=373">vbench</a> framework, to which I had to make some (useful, I hope) additions.</p>
<h2>What it looks like now</h2>
<p>You can check out a <a href="http://vene.github.com/scikit-learn-speed">teaser/demo</a> that was run on equidistant releases from the last two months. What has changed since the last version? Here&#8217;s a list in order of obviousness:</p>
<ul>
<li>We now use the lovely scikit-learn theme</li>
<li>Timing graphs now show the ±1 standard deviation range</li>
<li>cProfile output is displayed for all the benchmarks, so we can easily see at a glance what&#8217;s up</li>
<li>Said profiler output is collapsible using <a href="http://www.jqueryui.com/demos/accordion/">JQueryUI goodness</a>
<li>There now is an improved <a href="http://vene.github.com/scikit-learn-speed/quick_start.html">Quick Start guide</a> to running vbench on your machine</li>
</ul>
<h2>What made this possible</h2>
<p>I have done some more refactoring in my vbench fork, because I didn&#8217;t want to have a huge, monolithic <code>Benchmark</code> class that was specific to what we want in scikit-learn-speed. So on this branch, I set up a mixin/multiple inheritance hierarchy of benchmark classes. </p>
<p>The <code>Benchmark</code> class in vbench is now an abstract base class, with some common functionality and structure.<br />
Our <code>SklBenchmark</code> class is defined in scikit-learn-speed as:</p>
<p><code>class SklBenchmark(CProfileBenchmarkMixin,  MemoryBenchmarkMixin, PythonBenchmark): </code></p>
<p>Let&#8217;s read this from right to left:</p>
<ul>
<li><code>PythonBenchmark</code>: This class stores <code>code</code>, <code>setup</code> and <code>cleanup</code> Python code as strings, and implements simple timing mechanisms using the <code>time</code> module.</li>
<li>Bonus: <code>TimeitBenchmark</code>: This class extends <code>PythonBenchmark</code> with the <code>timeit</code> micro-benchmark timing method previously used in vbench. We turned this off in scikit-learn-speed.</li>
<li><code>MemoryBenchmarkMixin</code>: This adds memory benchmarking using <a href="http://pypi.python.org/pypi/memory_profiler">memory_profiler</a>.</li>
<li><code>CProfileBenchmarkMixin</code>: This runs the code through <a href="http://docs.python.org/library/profile.html#module-cProfile">cProfile</a> and implements mechanisms to report the output.</code>
</ul>
<p>The database is not flexible enough to adapt to arbitrary benchmark structure right now, so if anybody would like to help the effort, it would be very appreciated.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/07/27/profiler-output-benchmark-standard-deviation-and-other-goodies-in-scikit-learn-speed/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Scikit-learn-speed HTML reports teaser</title>
		<link>http://blog.vene.ro/2012/07/20/scikit-learn-speed-html-reports-teaser/</link>
		<comments>http://blog.vene.ro/2012/07/20/scikit-learn-speed-html-reports-teaser/#comments</comments>
		<pubDate>Fri, 20 Jul 2012 12:40:38 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[gsoc]]></category>
		<category><![CDATA[scikit-learn-speed]]></category>
		<category><![CDATA[vbench]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=470</guid>
		<description><![CDATA[EDIT: I made the plots a little more readable, check it out! Last time, I teased you with a screenshot of local output. Now, I will tease you with the benchmarks run on a couple of recent commits, along with some from earlier this year. After some effort and bugfixes, the project now reliably runs...  <a href="http://blog.vene.ro/2012/07/20/scikit-learn-speed-html-reports-teaser/" title="Read Scikit-learn-speed HTML reports teaser">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>EDIT: I made the plots a little more readable, check it out!</p>
<p>Last time, I teased you with a screenshot of local output. Now, I will tease you with the benchmarks run on a couple of recent commits, along with some from earlier this year.</p>
<p>After some effort and bugfixes, the project now reliably runs on different machines, so the next step to host it on a remote server and invoke it daily is getting closer. In the mean time, you can have a look at <a href="http://vene.github.com/scikit-learn-speed/" title="scikit-learn-speed">the sample output</a>.</p>
<p>Note that just last time, the plots look jagged but the differences are mostly minor and significant conclusions cannot be drawn yet, but as the suite will start running daily, the plots will become much more meaningful. I could waste time running the suite on more previous commits, but the results wouldn&#8217;t be comparable with the ones from the deployed system, because of hardware differences.</p>
<p>Playing around with this makes me want a couple of features in vbench. One is the possibility to overlay related benchmarks on the same plot (for example, different parameters for the same algorithm and data): this could be useful to spot patterns. A second one is some query / sorting support: see what are the most expensive benchmarks, see what benchmarks show the biggest jump in performance (but this could become a historical wall of fame or shame).</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/07/20/scikit-learn-speed-html-reports-teaser/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Memory benchmarking with vbench</title>
		<link>http://blog.vene.ro/2012/07/05/memory-benchmarking-with-vbench/</link>
		<comments>http://blog.vene.ro/2012/07/05/memory-benchmarking-with-vbench/#comments</comments>
		<pubDate>Thu, 05 Jul 2012 10:38:06 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[memit]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[vbench]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=461</guid>
		<description><![CDATA[The scikit-learn-speed project now has memory usage benchmarking! This was accomplished by building on what I described in my recent posts, specifically the extensions to Fabian&#8217;s memory_profiler that you can find in my fork, but they will be merged upstream soon. The key element is the %magic_memit function whose development I blogged about on several...  <a href="http://blog.vene.ro/2012/07/05/memory-benchmarking-with-vbench/" title="Read Memory benchmarking with vbench">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>The <a href="https://github.com/vene/scikit-learn-speed">scikit-learn-speed project</a> now has memory usage benchmarking!</p>
<p>This was accomplished by building on what I described in my recent posts, specifically the extensions to Fabian&#8217;s <a href="https://github.com/fabianp/memory_profiler">memory_profiler</a> that you can find in <a href="https://github.com/vene/memory_profiler">my fork</a>, but they will be merged upstream soon. The key element is the <code>%magic_memit</code> function whose development I blogged about <a href="http://blog.vene.ro/2012/06/30/quick-memory-usage-benchmarking-in-ipython/" title="Quick memory usage benchmarking in IPython">on</a> <a href="http://blog.vene.ro/2012/07/02/more-on-memory-benchmarking/" title="More on memory benchmarking">several</a> <a href="http://blog.vene.ro/2012/07/04/on-why-my-memit-fails-on-osx/" title="On why my %memit fails on OSX">occasions</a>. I plugged this into <a href="http://wesmckinney.com/blog/?p=373">vbench</a> in a similar way to how the timings are computed, all with great success.</p>
<p>Here is a screenshot of the way a simple benchmark looks now, with just a few data points.</p>
<div id="attachment_464" class="wp-caption aligncenter" style="width: 610px"><a href="http://blog.vene.ro/wp-content/uploads/2012/07/vbench1.png"><img src="http://blog.vene.ro/wp-content/uploads/2012/07/vbench1.png" alt="A screenshot showing generated output from the scikit-learn-speed project, illustrating memory usage benchmarking." title="Memory benchmarking in scikit-learn-speed powered by vbench." width="600" height="859" class="size-full wp-image-464" /></a><p class="wp-caption-text">Memory benchmarking in scikit-learn-speed powered by vbench.</p></div>
<p>You can check it out and use it yourself for your benchmarks, but you need to use the vbench from the <a href="https://github.com/vene/vbench/tree/memory">memory branch on my fork</a>.</p>
<p>Of course, there are some important caveats. I am running this on my laptop, which runs OS X Lion, so, under the effect of <a href="http://blog.vene.ro/2012/07/04/on-why-my-memit-fails-on-osx/" title="On why my %memit fails on OSX">this bug</a>, I hardcoded the &#8216;<code>-i</code>&#8216; so the memory benchmarks are not realistic. Also, the y-range should probably be forced wider, because the plots look erratic, showing the very small noise at a large-scale.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/07/05/memory-benchmarking-with-vbench/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>On why my %memit fails on OSX</title>
		<link>http://blog.vene.ro/2012/07/04/on-why-my-memit-fails-on-osx/</link>
		<comments>http://blog.vene.ro/2012/07/04/on-why-my-memit-fails-on-osx/#comments</comments>
		<pubDate>Wed, 04 Jul 2012 10:49:55 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[IPython]]></category>
		<category><![CDATA[magic]]></category>
		<category><![CDATA[memit]]></category>
		<category><![CDATA[mprun]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=452</guid>
		<description><![CDATA[In my last post I mentioned that I&#8217;m not satisfied with the current state of %memit, because some more complicated numerical function calls make it crash. I will start this post with a reminder of a pretty important bug: On MacOS X (10.7 but maybe more), after forking a new process, there is a segfault...  <a href="http://blog.vene.ro/2012/07/04/on-why-my-memit-fails-on-osx/" title="Read On why my %memit fails on OSX">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>In my <a href="http://blog.vene.ro/2012/07/02/more-on-memory-benchmarking/" title="More on memory benchmarking">last post</a> I mentioned that I&#8217;m not satisfied with the current state of <code>%memit</code>, because some more complicated numerical function calls make it crash. I will start this post with a reminder of a pretty important bug:</p>
<p><strong><a href="https://gist.github.com/2027412">On MacOS X (10.7 but maybe more), after forking a new process, there is a segfault in Grand Central Dispatch on the BLAS DGEMM function from Accelerate.</a><br />
</strong></p>
<p><strong>EDIT 1:</strong> In a hurry, I forgot to mention how <a href="http://twitter.com/ogrisel/">Olivier Grisel</a> and <a href="https://github.com/cournape">David Cournapeau</a> spent some time narrowing down this issue, starting from an <a href="https://github.com/scikit-learn/scikit-learn/issues/636">odd testing bug in scikit-learn</a>. They reported it to Apple, but there was, as of the date of this post, no reaction.</p>
<p><strong>EDIT 2:</strong> MinRK <a href="https://twitter.com/minrk/status/228265246819774464" title="Min's tweet">confirms</a>, and I verified shortly after, that this bug is fixed in Mountain Lion (10.8). Still not sure how far back it goes, though, so feedback is welcome.</p>
<p>When I first tried to make the <code>%memit</code> magic, I thought about simply measuring the current memory, running the command, and measuring the memory again. The problem is the results are not consistent, because Python <a href="http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm">tries to reuse already allocated memory whenever it can</a>. </p>
<p>Using memory_profiler, here&#8217;s an example illustrating this elastic memory management:</p>
<pre class="brush: python; title: ; notranslate">
# mem_test.py
import numpy as np


def make_a_large_array():
    return np.ones((1000, 1000))


def main():
    make_a_large_array()
    make_a_large_array()
    make_a_large_array()

# in IPython:
In [1]: import mem_test

In [2]: %mprun -f mem_test.main mem_test.main()
Filename: mem_test.py

Line #    Mem usage  Increment   Line Contents
==============================================
     8   24.8477 MB  0.0000 MB   def main():
     9   24.8633 MB  0.0156 MB       make_a_large_array()
    10   32.4688 MB  7.6055 MB       make_a_large_array()
    11   32.4688 MB  0.0000 MB       make_a_large_array()
</pre>
<p>If this was in an IPython environment, and one would like to see how much memory <code>make_a_large_array()</code> uses, you could say we can simply run it a few times and take the maximum. However, if you happened to accidentally call <code>main()</code> once before, you will no longer get a good result:</p>
<pre class="brush: python; title: ; notranslate">
In [3]: %mprun -f mem_test.main mem_test.main()
Filename: mem_test.py

Line #    Mem usage  Increment   Line Contents
==============================================
     8   32.4922 MB  0.0000 MB   def main():
     9   32.5234 MB  0.0312 MB       make_a_large_array()
    10   32.5234 MB  0.0000 MB       make_a_large_array()
    11   32.5234 MB  0.0000 MB       make_a_large_array()
</pre>
<p>So how can we get consistent results for the memory usage of an instruction? We could run it in a fresh, new process. I implemented this in %memit and it shows:</p>
<pre class="brush: python; title: ; notranslate">
In [5]: %memit mem_test.make_a_large_array()
maximum of 3: 8.039062 MB per loop

In [6]: %memit mem_test.make_a_large_array()
maximum of 3: 8.035156 MB per loop

In [7]: %memit mem_test.make_a_large_array()
maximum of 3: 8.042969 MB per loop
</pre>
<p>This way you can also realistically benchmark assignments: </p>
<pre class="brush: python; title: ; notranslate">
In [8]: %memit X = mem_test.make_a_large_array()
maximum of 3: 8.054688 MB per loop

In [9]: %memit X = mem_test.make_a_large_array()
maximum of 3: 8.058594 MB per loop

In [10]: %memit X = mem_test.make_a_large_array()
maximum of 3: 8.058594 MB per loop
</pre>
<p>If we don&#8217;t spawn a subprocess, <code>del</code> doesn&#8217;t help, but allocating new variables does:</p>
<pre class="brush: python; title: ; notranslate">
In [11]: %memit -i X = mem_test.make_a_large_array()
maximum of 3: 7.632812 MB per loop

In [12]: del X

In [13]: %memit -i X = mem_test.make_a_large_array()
maximum of 3: 0.000000 MB per loop

In [14]: %memit -i Y = mem_test.make_a_large_array()
maximum of 3: 7.632812 MB per loop

In [15]: %memit -i Z = mem_test.make_a_large_array()
maximum of 3: 7.632812 MB per loop
</pre>
<p>Now, the problem is that when the function that you are benchmarking contains calls to <code>np.dot</code> (matrix multiplication), the subprocess will consistently fail with SIGSEGV on affected OS X systems. These are actually pretty much all the functions that I intended <code>%memit</code> for: numerical applications. For that reason, I have made <code>%memit</code> notify the user when all subprocesses fail, and to suggest the usage of the <code>-i</code> flag.</p>
<p>I think that, with this update, <code>%memit</code> is flexible and usable enough for actual use, and therefore for merging into memory_profiler.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/07/04/on-why-my-memit-fails-on-osx/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>More on memory benchmarking</title>
		<link>http://blog.vene.ro/2012/07/02/more-on-memory-benchmarking/</link>
		<comments>http://blog.vene.ro/2012/07/02/more-on-memory-benchmarking/#comments</comments>
		<pubDate>Mon, 02 Jul 2012 09:27:34 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[IPython]]></category>
		<category><![CDATA[magic]]></category>
		<category><![CDATA[memit]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[memory_profiler]]></category>
		<category><![CDATA[mprun]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=444</guid>
		<description><![CDATA[Following up on my task to make it easier to benchmark memory usage in Python, I updated Fabian&#8217;s memory_profiler to include a couple of useful IPython magics. While in my last post, I used the new IPython 0.13 syntax for defining magics, this time I used the backwards-compatible one from the previous version. You can...  <a href="http://blog.vene.ro/2012/07/02/more-on-memory-benchmarking/" title="Read More on memory benchmarking">Read more &#187;</a>]]></description>
				<content:encoded><![CDATA[<p>Following up on my task to make it easier to benchmark memory usage in Python, I updated Fabian&#8217;s <a href="http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/">memory_profiler</a> to include a couple of useful IPython magics. While in my <a href="http://blog.vene.ro/2012/06/30/quick-memory-usage-benchmarking-in-ipython/" title="Quick memory usage benchmarking in IPython">last post</a>, I used the new IPython 0.13 syntax for defining magics, this time I used the backwards-compatible one from the previous version.</p>
<p>You can find this work-in-progress as a <a href="https://github.com/fabianp/memory_profiler/pull/13">pull request on memory_profiler</a> from where you can trace it to my GitHub repo. Here&#8217;s what you can do with it:</p>
<h2>%mprun</h2>
<p>Copying the spirit of <code><a href="http://packages.python.org/line_profiler/">%lprun</a></code>, since imitation is the most sincere form of flattery, you can use %mprun to easily view line-by-line memory usage reports, without having to go in and add the <code>@profile</code> decorator.</p>
<p>For example:</p>
<pre class="brush: python; title: ; notranslate">

In [1]: import numpy as np

In [2]: from sklearn.linear_model import ridge_regression

In [3]: X, y = np.array([[1, 2], [3, 4], [5, 6]]), np.array([2, 4, 6])

In [4]: %mprun -f ridge_regression ridge_regression(X, y, 1.0)

(...)

   109   41.6406 MB  0.0000 MB           if n_features &gt; n_samples or \
   110   41.6406 MB  0.0000 MB              isinstance(sample_weight, np.ndarray) or \
   111   41.6406 MB  0.0000 MB              sample_weight != 1.0:
   112                           
   113                                       # kernel ridge
   114                                       # w = X.T * inv(X X^t + alpha*Id) y
   115                                       A = np.dot(X, X.T)
   116                                       A.flat[::n_samples + 1] += alpha * sample_weight
   117                                       coef = np.dot(X.T, _solve(A, y, solver, tol))
   118                                   else:
   119                                       # ridge
   120                                       # w = inv(X^t X + alpha*Id) * X.T y
   121   41.6484 MB  0.0078 MB               A = np.dot(X.T, X)
   122   41.6875 MB  0.0391 MB               A.flat[::n_features + 1] += alpha
   123   41.7344 MB  0.0469 MB               coef = _solve(A, np.dot(X.T, y), solver, tol)
   124                           
   125   41.7344 MB  0.0000 MB       return coef.T

</pre>
<h2>%memit</h2>
<p>As described in my previous post, this is a <code>%timeit</code>-like magic for quickly seeing how much memory a Python command uses.<br />
Unlike %timeit, however, the command needs to be executed in a fresh process. I have to dig in some more to debug this, but if the command is run in the current process, very often the difference in memory usage will be insignificant, I assume because preallocated memory is used. The problem is that when running in a new process, some functions that I tried to bench crash with <code>SIGSEGV</code>. For a lot of stuff, though, <code>%memit</code> is currently usable:</p>
<pre class="brush: python; title: ; notranslate">
In [1]: import numpy as np

In [2]: X = np.ones((1000, 1000))

In [3]: %memit X.T
worst of 3: 0.242188 MB per loop

In [4]: %memit np.asfortranarray(X)
worst of 3: 15.687500 MB per loop

In [5]: Y = X.copy('F')

In [6]: %memit np.asfortranarray(Y)
worst of 3: 0.324219 MB per loop
</pre>
<p>It is very easy, using this small tool, to see what forces memory copying and what does not.</p>
<h2>Installation instructions</h2>
<p>First, you have to get the source code of this version of memory_profiler. Then, it depends on your version of IPython. If you have 0.10, you have to edit <code>~/.ipython/ipy_user_conf.py</code> like this: (once again, instructions <em>borrowed</em> from <a href="http://packages.python.org/line_profiler/">line_profiler</a>)</p>
<pre class="brush: python; title: ; notranslate">
# These two lines are standard and probably already there.
import IPython.ipapi
ip = IPython.ipapi.get()

# These two are the important ones.
import memory_profiler
ip.expose_magic('mprun', memory_profiler.magic_mprun)
ip.expose_magic('memit', memory_profiler.magic_memit)
</pre>
<p>If you&#8217;re using IPython 0.11 or newer, the steps are different. First create a configuration profile:</p>
<pre class="brush: bash; title: ; notranslate">
$ ipython profile create
</pre>
<p>Then create a file named <code>~/.ipython/extensions/memory_profiler_ext.py</code> with the following content:</p>
<pre class="brush: python; title: ; notranslate">
import memory_profiler

def load_ipython_extension(ip):
    ip.define_magic('mprun', memory_profiler.magic_mprun)
    ip.define_magic('memit', memory_profiler.magic_memit)
</pre>
<p>Then register it in <code>~/.ipython/profile_default/ipython_config.py</code>, like this. Of course, if you already have other extensions such as <code>line_profiler_ext</code>, just add the new one to the list.</p>
<pre class="brush: python; title: ; notranslate">
c.TerminalIPythonApp.extensions = [
    'memory_profiler_ext',
]
c.InteractiveShellApp.extensions = [
    'memory_profiler_ext',
]
</pre>
<p>Now launch IPython and you can use the new magics like in the examples above.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/07/02/more-on-memory-benchmarking/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>
