<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>My programming and machine learning blog</title>
	<atom:link href="http://blog.vene.ro/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.vene.ro</link>
	<description>scikit-learn contributor</description>
	<lastBuildDate>Sun, 15 Apr 2012 23:13:01 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>GSoC 2012 proposal: Need for scikit-learn speed</title>
		<link>http://blog.vene.ro/2012/04/16/gsoc-2012-proposal-need-for-scikit-learn-speed/</link>
		<comments>http://blog.vene.ro/2012/04/16/gsoc-2012-proposal-need-for-scikit-learn-speed/#comments</comments>
		<pubDate>Sun, 15 Apr 2012 22:37:51 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[gsoc]]></category>
		<category><![CDATA[proposal]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=376</guid>
		<description><![CDATA[This summer I hope to be able to put in another full-time amount of effort into scikit-learn. After a successful Google Summer of Code project last year on dictionary learning, I now plan to do some low-level work. The title of my proposal is: &#8220;Need for scikit-learn speed&#8221; and, in a nutshell, will make the...  <a href="http://blog.vene.ro/2012/04/16/gsoc-2012-proposal-need-for-scikit-learn-speed/" title="Read GSoC 2012 proposal: Need for scikit-learn speed">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>This summer I hope to be able to put in another full-time amount of effort into scikit-learn. After a successful Google Summer of Code project last year on dictionary learning, I now plan to do some low-level work. The title of my proposal is: &#8220;Need for scikit-learn speed&#8221; and, in a nutshell, will make the scikit go faster and will help it stay that way.</p>
<p>Scikit-learn has always enforced standards of quality that kept all implementations at a non-trivial level (i.e. faster than using <a href="http://docs.scipy.org/doc/scipy/reference/optimize.html">the generic optimizers in scipy</a>). However, not all modules are equal: some have received more attention for speed than others (for example the SGD classes). I intend to raise the bar towards a more uniform level.</p>
<h2>Are you crazy, can you really do this?</h2>
<p>Well, of course. This might not the usual GSoC proposal, but I can show how I can do it and how it&#8217;s easily quantifiable. Actually, a very important part of the work will be to make scikit-learn&#8217;s speed easily measurable.</p>
<p>As for the specific speed-ups, I have shown <a href="http://blog.vene.ro/2011/08/07/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-1/" title="Optimizing Orthogonal Matching Pursuit code in Numpy, part 1">in</a> <a href="http://blog.vene.ro/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/" title="Optimizing Orthogonal Matching Pursuit code in Numpy, part 2">the</a> <a href="http://blog.vene.ro/2011/11/18/the-nasty-bug-crawling-in-my-orthogonal-matching-pursuit-code/" title="The nasty bug crawling in my Orthogonal Matching Pursuit code">past</a> that I can do algorithmic and memory layout optimizations in numerical code. There are parts in the scikit-learn that can benefit from such work: for example only recently Peter merged this <a href="https://github.com/scikit-learn/scikit-learn/pull/545">pull request</a> significantly improving SGDClassifier&#8217;s test time performance by switching the memory layout of the coefficients: they were laid out optimally for the training phase, not for the prediction phase.</p>
<p>There are certainly more opportunities for such speed improvements in the scikit. Of course there is a lot of code that can&#8217;t reasonably be made any faster (I have a feeling that SGDClassifier is at the moment such a case, but we can&#8217;t know for sure without heavy profiling). But generally there are many speed fixes that could weigh a lot: for example, a <a href="http://cython.org/">Cython</a> implementation of the <code>euclidean_distances</code> function that is able to use preallocated memory will improve the performance of raw NearestNeighbours queries as well as of the KMeans and hierarchical clustering algorithms.</p>
<h2>How will we be able to tell if you succeed?</h2>
<p>A key part of the GSoC project is setting up a <abbr title="Continuous Integration">CI</abbr>-style benchmark platform. The point is to be able to track how the speed of certain operations evolves in time. For such purposes, Wes McKinney developed the <a href="https://github.com/pydata/vbench">vbench</a> project, introduced in <a href="http://wesmckinney.com/blog/?p=373">this blog post</a>. The goal is for every scikit-learn module to have several such benchmarks, for differently shaped and structured data.</p>
<p>Having such a benchmark suite available is the equivalent of a test suite, in terms of performance. It makes developers be extra conscious of the effect of their changes. It also makes it more fun to chase speed improvements, thanks to the positive reinforcement it gives.</p>
<p>There are some static benchmarks comparing the performance of scikit-learn algorithms with other well-known libraries in the <a href="http://scikit-learn.sourceforge.net/ml-benchmarks/">ml-benchmarks</a> project. It would be very helpful to have such a benchmark suite that automatically keeps up-to-date.</p>
<h2>Side effects</h2>
<p>The cool thing about such a project is that it should raise the overall quality of the scikit. The refactoring will lead to an increase in test coverage, because the low-coverage modules are expected to be less optimized as well. Also, the benchmarks will lead to well-backed summaries in the documentation, such as <a href="http://scikit-learn.org/dev/modules/clustering.html#overview-of-clustering-methods">the one recently added in the clustering section</a>.</p>
<p>Since the scikit is reaching a state where many well-known algorithms are available, the <strong>1.0</strong> release is slowly approaching. My Google Summer of Code project should bring the scikit significantly closer to that milestone.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/04/16/gsoc-2012-proposal-need-for-scikit-learn-speed/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Romanian people and coffee</title>
		<link>http://blog.vene.ro/2012/04/13/romanian-people-and-coffee/</link>
		<comments>http://blog.vene.ro/2012/04/13/romanian-people-and-coffee/#comments</comments>
		<pubDate>Fri, 13 Apr 2012 19:20:23 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=368</guid>
		<description><![CDATA[So I got my hands of the Google N-gram data for the Romanian language. It&#8217;s noisy as hell, has some other subtle issues too, but here&#8217;s the first thing I noticed: The Romanian word for coffee is cafea, and the more you crave it, the longer you pronunce the final a: I really need some...  <a href="http://blog.vene.ro/2012/04/13/romanian-people-and-coffee/" title="Read Romanian people and coffee">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>So I got my hands of the <a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html" title="Google Research">Google N-gram data</a> for the Romanian language. It&#8217;s noisy as hell, has some other subtle issues too, but here&#8217;s the first thing I noticed:</p>
<p>The Romanian word for coffee is <em>cafea</em>, and the more you crave it, the longer you pronunce the final <em>a</em>: I really need some <em>cafeaaaa</em> right now.</p>
<p>Thanks to Google, here are the numbers:</p>
<a href="http://blog.vene.ro/wp-content/uploads/2012/04/cafeaaa.png"><img src="http://blog.vene.ro/wp-content/uploads/2012/04/cafeaaa-300x218.png" alt="Distribution of the length of the final letter in the Romanian word for coffee." title="cafeaaa" width="300" height="218" class="aligncenter size-medium wp-image-369" /></a>
<p>Post scriptum: I hope you like the theme: I installed the <a href="http://www.huyng.com/posts/sane-color-scheme-for-matplotlib/" title="www.huyng.com/posts/sane-color-scheme-for-matplotlib">sane matplotlib color scheme</a> from Huy Nguyen.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/04/13/romanian-people-and-coffee/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nash-Williams theorem on the Hamiltonian property of some regular graphs</title>
		<link>http://blog.vene.ro/2012/01/29/nash-williams-theorem-on-the-hamiltonian-property-of-some-regular-graphs/</link>
		<comments>http://blog.vene.ro/2012/01/29/nash-williams-theorem-on-the-hamiltonian-property-of-some-regular-graphs/#comments</comments>
		<pubDate>Sun, 29 Jan 2012 20:31:09 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[graph]]></category>
		<category><![CDATA[graph theory]]></category>
		<category><![CDATA[hamiltonian]]></category>
		<category><![CDATA[nash-williams]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=348</guid>
		<description><![CDATA[I have been digging on the internet for the proof of this theorem for the last couple of days without success. The result was published by Sir Crispin Nash-Williams as Valency Sequences which force graphs to have Hamiltonian Circuits. Interim Rep, University of Waterloo Res Rep., Waterloo, Ontario, 1969. However, this old paper is unavailable...  <a href="http://blog.vene.ro/2012/01/29/nash-williams-theorem-on-the-hamiltonian-property-of-some-regular-graphs/" title="Read Nash-Williams theorem on the Hamiltonian property of some regular graphs">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>I have been digging on the internet for the proof of this theorem for the last couple of days without success. The result was published by Sir Crispin Nash-Williams as <em>Valency Sequences which force graphs to have Hamiltonian Circuits</em>. Interim Rep, University of Waterloo Res Rep., Waterloo, Ontario, 1969. However, this old paper is unavailable online but I have a proof in some lecture notes from my class, that I want to share here.</p>
<p><strong>Theorem.</strong> Let \(G=(V, E)\) be an \(n\)-regular graph with \(|V| = 2n + 1\). Then, \(G\) is Hamiltonian.</p>
<p><strong>Proof.</strong> We first remark that \(n\) must be even, since $$\sum_{x \in V} d(x) = n(2n + 1) = 2|E|$$ We might try to apply Dirac&#8217;s theorem, which would give us a Hamiltonian cycle if \( \forall x \in V, d(x) \geq \frac{|V|}{2}\). But in the current case, \(\forall x \in V, d(x) = n < \frac{2n+1}{2}\).</p>
<p>So we force Dirac by adding an extra vertex \(w\) and connecting it to all \( x \in V \). In this new graph \(G'\), \(d(x) = n + 1 \forall x \in V\) and \(d(w) = 2n + 1\). Therefore we have a Hamiltonian cycle that passes through \(w\) and in which, \(w\) is adjacent to two vertices \(x\) and \(y \in V\). Therefore this cycle induces a Hamiltonian path in \(G\): $$P = [x = v_0, v_1, ..., v_{2n-1}, v_{2n}=y] $$</p>
<p>Suppose that \(G\) is not Hamiltonian. It follows that if \( v_0v_i \in E \), then \( v_{i-1}v_{2n} \notin E\) and also that if \( v_0v_i \notin E \), then \( v_{i-1}v_{2n} \in E\).</p>
<p>We have two cases. If \(v_0\) is adjacent to \(v_1, ..., v_n\) then it follows that \(v_{2n}\) is adjacent to \(v_n, v_{n+1}, ..., v_{2n-1}\), since it cannot be adjacent to any \(v_i, i < n\) without creating a Hamiltonian cycle. But in this case, in the graph induced by the first half \(G[\{v_0, v_1, ... v_n\}]\), \(v_n\) cannot be adjacent to all the others, since in \(G\) it has degree \(n\) and it already has \(2\) outgoing edges. So there is at least one vertex \(v_i, i < n\) that isn't adjacent to it, which means \(v_i\) is adjacent to some \(v_j, j > n\), thus forming a Hamiltonian cycle.</p>
<p>In the second case, we have a vertex \(v_i, 2 \leq i \leq 2n &#8211; 1\) such that \(v_0v_i \notin E\) and \(v_0v_{i+1} \in E\). This also means that \(v_{i-1}v_{2n} \in E\).</p>
<p>We therefore have a cycle of length \(2n\) in \(G\) that excludes \(v_i\). Let&#8217;s rename this cycle \(C=[y_1, y_2, ..., y_{2n}, y_1]\) and \(v_i=y_0\).</p>
<p>\(y_0\) cannot be adjacent to two consecutive vertices \(y_i\) and \(y_{i+1}\) because this will give a Hamiltonian cycle. But we know that \(deg(y_0) = n\). It follows that it&#8217;s adjacent to all of the even or odd numbered vertices. We assume the latter, without loss of generality. Let \(2k\) be some even index. Notice that we have \(\{y_0y_{2k-1}, y_0y_{2k+1}\} \subset E\) and we can follow the cycle \(C\) from \(y_{2k+1}\) all the way back to \(y_{2n-1}\) giving us a new cycle \(C&#8217; = [y_1, y_2, ..., y_{2n-1}, y_0, y_{2k+1}, ..., y_{2n}, y_1]\) also of length \(2n\). So by repeating the same reasoning for every even vertex, by placing it in the middle and building a cycle around it, it follows that every even vertex is adjacent to all the odd vertices. But there are \(n+1\) even indices, so it follows that the degree of any odd vertex is at least \(n+1\), contradicting the initial conditions of the theorem. \(\square\)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/01/29/nash-williams-theorem-on-the-hamiltonian-property-of-some-regular-graphs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Moving out</title>
		<link>http://blog.vene.ro/2012/01/06/moving-out/</link>
		<comments>http://blog.vene.ro/2012/01/06/moving-out/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 22:16:28 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.vene.ro/?p=344</guid>
		<description><![CDATA[Happy new year, friends! I&#8217;ve made a New Year&#8217;s resolution to build a better web presence and make better use of the domain that I previously only used for mail. This has prompted me to move my blog over to http://blog.vene.ro which hopefully is shorter, better, faster and stronger!]]></description>
			<content:encoded><![CDATA[<p>Happy new year, friends!</p>
<p>I&#8217;ve made a New Year&#8217;s resolution to build a better web presence and make better use of the domain that I previously only used for mail.</p>
<p>This has prompted me to move my blog over to http://blog.vene.ro which hopefully is shorter, better, faster and stronger!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2012/01/06/moving-out/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The nasty bug crawling in my Orthogonal Matching Pursuit code</title>
		<link>http://blog.vene.ro/2011/11/18/the-nasty-bug-crawling-in-my-orthogonal-matching-pursuit-code/</link>
		<comments>http://blog.vene.ro/2011/11/18/the-nasty-bug-crawling-in-my-orthogonal-matching-pursuit-code/#comments</comments>
		<pubDate>Fri, 18 Nov 2011 18:51:47 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[dictionary learning]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[omp]]></category>
		<category><![CDATA[orthogonal matching pursuit]]></category>

		<guid isPermaLink="false">http://venefrombucharest.wordpress.com/?p=310</guid>
		<description><![CDATA[A while back, Bob L. Sturm blogged about a similar implementation of OMP to the one in scikit-learn. Instead of using the Cholesky decomposition like we did, his Matlab code uses the QR decomposition, to a similar (or maybe even identical) outcome, in theory. So lucky that Alejandro pointed out to him the existence of...  <a href="http://blog.vene.ro/2011/11/18/the-nasty-bug-crawling-in-my-orthogonal-matching-pursuit-code/" title="Read The nasty bug crawling in my Orthogonal Matching Pursuit code">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>A while back, Bob L. Sturm blogged about a <a href="http://media.aau.dk/null_space_pursuits/2011/10/efficient-omp.html">similar implementation of OMP</a> to the one in scikit-learn. Instead of using the Cholesky decomposition like we did, his Matlab code uses the QR decomposition, to a similar (or maybe even identical) outcome, in theory. So lucky that Alejandro pointed out to him the existence of the scikit-learn implementation, and that Bob&#8217;s code <a href="http://media.aau.dk/null_space_pursuits/2011/10/omp-in-python-strange-results.html">exposed a bug</a> that all the test coverage didn&#8217;t catch! This plot should increase, certainly not decrease! Something is clearly wrong here.<br />
<a href="http://media.aau.dk/null_space_pursuits/2011/10/omp-in-python-strange-results.html"><img alt="OMP buggy phase transition, decreasing instead of increasing" src="http://media.aau.dk/null_space_pursuits/2011/10/17/OMPscikit.png" title="OMP buggy phase transition" class="aligncenter" width="400" /></a><br />
Luckily we were able to find it and <a href="http://media.aau.dk/null_space_pursuits/2011/10/to-the-rescue.html">fix it</a> very quickly. I have updated the old entries I wrote on the OMP optimizations, so they no longer include the bug. But I take this opportunity to explain what exactly went wrong.</p>
<p>A key part of the optimization was that slicing out arbitrary columns out of an array is slow when they are passed to BLAS functions like matrix multiplication. In order to make the most out of your code, the data should have a contiguous layout. We achieved this by swapping active dictionary atoms (columns) to the beginning of the array.</p>
<p>Something that can happen, but won&#8217;t happen very often, is that after an atom is selected as active, the atom that takes its place after swapping needs to be selected. This is rare because dictionaries have many columns, out of which only very very few will be active. But when it happens, because the code didn&#8217;t keep track of swapped indices, the corresponding coefficient of the solution would get updated twice, leading to more zero entries than we should have. A keen eye could have noticed that the first `n_nonzero_coefs` entries in OMP solution vectors were never non-zero. But alas, my eye was not a keen one at all.</p>
<p>In other words, the following test (that was written after the bug was found, unfortunately) was failing:</p>
<pre class="brush: python; title: ; notranslate">
def test_swapped_regressors():
    gamma = np.zeros(n_features)
    # X[:, 21] should be selected first, then X[:, 0] selected second,
    # which will take X[:, 21]'s place in case the algorithm does
    # column swapping for optimization (which is the case at the moment)
    gamma[21] = 1.0
    gamma[0] = 0.5
    new_y = np.dot(X, gamma)
    new_Xy = np.dot(X.T, new_y)
    gamma_hat = orthogonal_mp(X, new_y, 2)
    gamma_hat_gram = orthogonal_mp_gram(G, new_Xy, 2)
    # active indices should be [0, 21], but prior to the bugfix
    # the algorithm would update only [21] but twice
    assert_equal(np.flatnonzero(gamma_hat), [0, 21])
    assert_equal(np.flatnonzero(gamma_hat_gram), [0, 21])
</pre>
<p>Note that this bug has been fixed for a while, but I didn&#8217;t get the free time to write this post until now. Good news is: we fixed it, and did so very quickly after the report. So you can still trust me, I guess!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2011/11/18/the-nasty-bug-crawling-in-my-orthogonal-matching-pursuit-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Sampling Gamma random variates through the ratio-of-uniforms method</title>
		<link>http://blog.vene.ro/2011/10/09/sampling-gamma-random-variates-through-the-ratio-of-uniforms-method/</link>
		<comments>http://blog.vene.ro/2011/10/09/sampling-gamma-random-variates-through-the-ratio-of-uniforms-method/#comments</comments>
		<pubDate>Sun, 09 Oct 2011 13:40:47 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[python]]></category>
		<category><![CDATA[monte carlo]]></category>
		<category><![CDATA[numpy]]></category>
		<category><![CDATA[random sampling]]></category>
		<category><![CDATA[ratio-of-uniforms]]></category>
		<category><![CDATA[scipy]]></category>

		<guid isPermaLink="false">http://venefrombucharest.wordpress.com/?p=261</guid>
		<description><![CDATA[One year ago I had the chance to take a class on Monte Carlo simulation with prof. Ion Văduva, and my assignment for the class was to implement exactly what it says in the title of the blog post. I am going to walk you through the idea behind this. General formulation The ratio-of-uniforms is...  <a href="http://blog.vene.ro/2011/10/09/sampling-gamma-random-variates-through-the-ratio-of-uniforms-method/" title="Read Sampling Gamma random variates through the ratio-of-uniforms method">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>One year ago I had the chance to take a class on Monte Carlo simulation with prof. Ion Văduva, and my assignment for the class was to implement exactly what it says in the title of the blog post. I am going to walk you through the idea behind this.</p>
<h3>General formulation</h3>
<p>The ratio-of-uniforms is a method that can be applied to many density functions. Essentially, given a density function over \( \mathbb{R}^m\), \( f(x) = \frac{h(x)}{H}\) where \(H\) is a normalization constant (ie. \( h(x) \geq 0\), \( H = \int h(x)dX\)). Given a parameter \( c > 0 \) and a parametrization \(\phi\) from \( \mathbb{R}^{m+1}\) to \( \mathbb{R}^{m}\) expressed as: $$ \phi(v_0, v_1, &#8230;, v_m) = \left ( \frac{v_1}{v_0^c}, \frac{v_2}{v_0^c}, &#8230;, \frac{v_m}{v_0^c} \right )$$<br />
Define the set \( \mathcal{C} = \{\mathbf{v} \big | \gamma(\mathbf{v}) \leq 0, v_0 > 0\} \in \mathbb{R}^{m+1}\) where<br />
$$\gamma(\mathbf{v}) = \log v_0 &#8211; \frac{\log h(\phi(v_0, v_1, &#8230;, v_m))}{mc + 1}$$ If \( \mathcal{C}\) is bounded and we sample a uniform vector \( \mathbf{V} \sim \text{Uniform}(\mathcal{C})\) then \( \phi(\mathbf{V}) \sim f(x)\). Also note that the measure (volume) of the set \( \mathcal{C}\) is \( \frac{H}{mc + 1}\). I do not have any references for the proof, except for a book in Romanian, but if you are interested, just leave me a comment and I&#8217;ll do a follow-up post with the proofs.</p>
<h3>Univariate scenario</h3>
<p>For the univariate case, all the above simplifies to $$ \mathcal{C} = \left \{(u, v) \Big | 0 < u < \sqrt<br />
{h\left (\frac{v}{u^c}\right )} \right \} $$. We generate \( (U, V) \sim \text{Uniform}(\mathcal{C})\) and take \( \frac{V}{U^c} \sim f(x)\).<br />
Since we are looking at the (univariate) Gamma distribution, described by: $$ f(x; \nu, \theta) = \frac{x^{\mu - 1} \exp(\frac{-x}{\theta})}{\theta^k\Gamma(k)}$$ \( \nu\) is the shape parameter and \( \theta\) is the scale parameter.<br />
But because of the property that if \( X \sim \text{Gamma}(\nu, \theta)\), then for any \( k > 0\), \( kX \sim \text{Gamma}(\nu, k\theta)\), we conclude that we can fix \( \theta\) to 1 without loss of generality. Replacing in the style of the definition in the previous section, we have \( h(x; \nu) = x^{\nu-1}e^{-x}\) and \( H_\nu = \Gamma(\nu)\).<br />
This allows us to compute the equation of the boundary of the set \( \mathcal{C}\) which ends up being described by \(\gamma(u, v) = \log{u} &#8211; \frac{\nu &#8211; 1}{c + 1} \log{\left(\frac{v}{u^c}\right)} + \frac{1}{c+1} \frac{v}{u^c}\). For visualisation purposes, here is how it would look like for \( \nu=6, c=1\) (plotted using <a title="Wolfram Alpha" href="http://www.wolframalpha.com/">Wolfram Alpha</a>):<a href="http://blog.vene.ro/wp-content/uploads/2011/10/regiunea.png"><img class="aligncenter size-full wp-image-289" title="The accepting set C" src="http://blog.vene.ro/wp-content/uploads/2011/10/regiunea.png" alt="" width="294" height="300" /></a></p>
<h3>Sampling algorithm</h3>
<p>In order to uniformly sample from this set, we can apply basic rejection sampling: just uniformly sample from a rectangular region surrounding the set, and reject the points that do not satisfy the condition. In order to do this as efficiently as possible, we need to compute the minimal bounding box, which can be done by solving a couple of optimization problems using Lagrange multipliers and the KKT conditions. Also by looking closely at the image, you can see that the lower left corner is exactly the origin: this turns out not to be a coincidence. I won&#8217;t go into detail here, but here are the bounds I derived:<br />
$$ 0 < u < (\nu - 1)^\frac{\nu - 1}{c + 1} e ^ {-\frac{\nu - 1}{c + 1}} \text{ and } 0< v < \left(\frac{c\nu + 1}{c}\right)^{\frac{c\nu + 1}{c + 1}} e ^ {- \frac {c\nu + 1}{c+1}}$$</p>
<p>The probability of acceptance (which can be seen as the efficiency) of the rejection sampling method is given by the ratio of the areas of the set \( \mathcal{C}\) and the bounding box. The larger this probability, the less points we throw away and the more efficient the algorithm is. Using the values derived above, this probability is: $$ p(\nu, c) = \frac{\Gamma(\nu)e^{\nu}}{(c+1) (\nu - 1)^{\frac{\nu - 1}{c + 1}} \left(\frac{c\nu + 1}{c}\right)^{\frac{c\nu + 1}{c + 1}}}$$</p>
<p>Personally I got stumped here. The idea would be to determine the ideal \( c\) for a given \( \nu\) in order to maximize the probability, but I didn't manage to do it (I leave it as an exercise for the reader <img src='http://blog.vene.ro/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> ). Anyway, this is enough to proceed with an implementation, so I'm gonna give the Python code for it. Note that I used the name k for the shape parameter instead of \( \nu\). Also note that the case when \( 0 < \nu < 1\) needed to be treated separately, which I did using the following property: Let \( \nu \in (0, 1)\). If \( X' \sim \text{Gamma}(1+\nu, 1), U \sim \text{Uniform}(0, 1)\) then $$ X = X' \cdot \sqrt[\nu]{U} \sim \text{Gamma}(\nu, 1)$$ For a proof of this fact, see [<a href="#footnote-1">1</a>], which is a great article on generating Gamma variates.</p>
<h3>Implementation</h3>
<pre class="brush: python; title: ; notranslate">
from import numpy as np

def _cond(u, v, k, c):
    &quot;&quot;&quot;Identity function describing the acceptance region&quot;&quot;&quot;
    x = v / u ** c
    return (c + 1) * np.log(u) &lt;= (k - 1) * np.log(x) - x

def vn_standard_gamma(k, c=1.0, rng=np.random):
    &quot;&quot;&quot;Generates a single standard gamma random variate&quot;&quot;&quot;
    if k &lt;= 0:
        raise ValueError(&quot;Gamma shape should be positive&quot;)
    elif k &lt; 1:
        return vn_standard_gamma(1 + k, c, rng) * rng.uniform() ** (1 / k)
    elif k == 1:
        return rng.standard_exponential()
    else:
        a, b = get_bounds(k, c)
        while True:
            u, v = rng.uniform(0, a), rng.uniform(0, b)
            if _cond(u, v, k, c):
                break;
        return v / u ** c

def vn_gamma(k, t, shape=1, c=1.0, rng=np.random):
    &quot;&quot;&quot;Vectorized function to generate multiple gamma variates&quot;&quot;&quot;
    generator = lambda x: t * vn_standard_gamma(k, c, rng)
    generator = np.vectorize(generator)
    return generator(np.empty(shape))

def get_bounds(k, c=1.0):
    &quot;&quot;&quot;Computes the minimal upper bounds surrounding the acceptance region&quot;&quot;&quot;
    a = ((k - 1) / np.e) ** ((k - 1) / (c + 1))
    b = ((c * k + 1) / (c * np.e)) ** ((c * k + 1) / (c + 1))
    return a, b

def prob_acc(k, c=1.0):
    &quot;&quot;&quot;Calculates the probability of acceptance for the given parameters&quot;&quot;&quot;
    from scipy.special import gamma
    a, b = get_bounds(k, c)
    return gamma(k) / ((c + 1) * a * b)
</pre>
<h3>Results</h3>
<p>And of course I should show you that it works. Here are some histograms for various values of \( \nu\), with the theoretical density plotted in dotted red, after sampling \( 10^5\) values. The y-axis is the frequency (sorry for labeling in Romanian), and for the red dotted line it can be interpreted as the theoretical probability. You can clearly see the goodness of fit is excellent.</p>
<a href="http://blog.vene.ro/wp-content/uploads/2011/10/hist1.png"><img class="aligncenter size-full wp-image-301" title="Histogram for nu=6" src="http://blog.vene.ro/wp-content/uploads/2011/10/hist1.png" alt="" width="471" height="356" /></a><a href="http://blog.vene.ro/wp-content/uploads/2011/10/hist2.png"><img class="aligncenter size-full wp-image-302" title="Histogram for nu=100" src="http://blog.vene.ro/wp-content/uploads/2011/10/hist2.png" alt="" width="471" height="356" /></a><a href="http://blog.vene.ro/wp-content/uploads/2011/10/hist3.png"><img class="aligncenter size-full wp-image-303" title="Histogram for nu=0.66" src="http://blog.vene.ro/wp-content/uploads/2011/10/hist3.png" alt="" width="471" height="356" /></a>
<p><span id="footnote-1">[1]</span>: George Marsaglia and Wai Wan Tsang. 1998. <a href="http://www.jstatsoft.org/v03/i03/paper">The Monty Python method for generating random variables</a>. ACM Trans. Math. Softw. 24, 3 (September 1998), 341-350. <a href="http://doi.acm.org/10.1145/292395.292453">DOI=10.1145/292395.292453</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2011/10/09/sampling-gamma-random-variates-through-the-ratio-of-uniforms-method/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RANLP 2011 in Hissar, BG</title>
		<link>http://blog.vene.ro/2011/09/20/ranlp-2011-in-hissar-bg/</link>
		<comments>http://blog.vene.ro/2011/09/20/ranlp-2011-in-hissar-bg/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 12:17:35 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[conferences]]></category>
		<category><![CDATA[nlp]]></category>

		<guid isPermaLink="false">http://venefrombucharest.wordpress.com/?p=243</guid>
		<description><![CDATA[Last week was marked by the international RANLP (Recent Advances in Natural Language Processing) conference, taking place in a nice spa in Hissar, Bulgaria. The excellent folks from the computational linguistics group at the University of Wolverhampton were behind it, together with the Institute of Information and Communication Technologies from the Bulgarian Academy of Sciences....  <a href="http://blog.vene.ro/2011/09/20/ranlp-2011-in-hissar-bg/" title="Read RANLP 2011 in Hissar, BG">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Last week was marked by the international RANLP (Recent Advances in Natural Language Processing) conference, taking place in a nice spa in Hissar, Bulgaria. The excellent folks from the <a href="http://clg.wlv.ac.uk/">computational linguistics group</a> at the University of Wolverhampton were behind it, together with the <a href="http://www.iict.bas.bg/EN/index.html">Institute of Information and Communication Technologies</a> from the Bulgarian Academy of Sciences.<br />
<div id="attachment_247" class="wp-caption alignleft" style="width: 235px"><a href="http://blog.vene.ro/wp-content/uploads/2011/09/319030_10150303323328171_677848170_8151131_1046590975_n1.jpg"><img src="http://blog.vene.ro/wp-content/uploads/2011/09/319030_10150303323328171_677848170_8151131_1046590975_n1.jpg?w=225" alt="" title="Warm spring in Hissar" width="225" height="300" class="size-medium wp-image-247" /></a></p>
<p class="wp-caption-text">A fountain with warm mineral spring water in Hissarya. <br /><small>Picture by <a href="http://pers-www.wlv.ac.uk/~ex0233/" title="Miranda Chong">Miranda</a></small></p>
<p><p class="wp-caption-text"> </p></div><br />
I must begin by thanking them: the organization was impeccable! I&#8217;m not sure, but I think that at one point Ivelina was even running around buying routers to improve wifi coverage (which is already spectacular in Bulgaria &#8212; I&#8217;ve received reports from <a href="http://pers-www.wlv.ac.uk/~ex0233/" title="Miranda Chong">Miranda</a> that you can get wifi in the mountains!)</p>
<p>The schedule was busy, with three tracks going in parallel, in order to cover a wide range of topics in computational linguistics. The student workshop should also be noted for the excellent quality of the works there.</p>
<p>Of course it would be infeasible to write about all the great people I met and their high quality work. And if I were to write about all the fun we had, it would probably make this post look unprofessional <img src='http://blog.vene.ro/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . This doesn&#8217;t mean I forgot about any of you, and as soon as I get the chance to work on something related, I will most certainly write about it, and you.</p>
<p>So, if I would have to summarize the trends and the ideas stated during the conference and especially during the keynotes, I would say:</p>
<ul>
<li> When talking about word sense disambiguation, it&#8217;s wrong to speak about the different meanings of a word, but rather about the potential a word has for bringing a certain meaning in a certain context. See <a href="http://www.patrickhanks.com/">Patrick Hanks</a>&#8216; <a href="http://nlp.fi.muni.cz/projects/cpa/">Corpus Pattern Analysis</a>.  Without something like this, to have good WSD you need to heavily adjust the overlapping meanings from a Wordnet-style ontology.</li>
<li>Certain relations, such as temporal and spacial ones, can naturally be modeled by complex domain-specific logics (see <a href="http://sites.google.com/site/inderjeetmani/">Inderjeet Mani</a>&#8216;s new book, Interpreting Motion: Grounded Representations for Spatial Language, that is due for publishing). But these only appear in a small subset of human communication. The attempt to map human language to a complete logic in which to do general-purpose inference seems futile: <a href="http://u.cs.biu.ac.il/~dagan/">Ido Dagan</a> suggests textual entailment: reasoning directly in natural language, and only abstracting away to a formal logic system when need arises.</li>
<li>
If you have a large enough sample of n-gram frequency data, you can eventually beat the performance you can get with a limited amount of labeled data, and most importantly: it generalizes much better when going out of the domain you trained on. Apparently the best tool for this at the moment is the <a href="http://ngrams.googlelabs.com/datasets">Google n-gram data</a>, which has some limitations. In time, we can easily extend this data by huge amounts by mining n-grams from Wikipedia (which allegedly has a higher count of distinct n-grams than the Google dataset), and more importantly, by aligning multi-language data, making use of transliterations and cognate identification.
</li>
</ul>
<p>Please note that I may be (or most probably, am) ignorant of older instances of similar ideas, and I may have misunderstood certain claims. Please feel free to discuss in the comments whether you think I forgot about something important, or whether I am plain wrong about something. In particular, I seem to have been completely ignorant of the existence of the Google n-gram data, which has been around for quite a while, so I must have missed other important things as well.</p>
<p>Take care, kind readers, and express your opinion!<br />
V</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2011/09/20/ranlp-2011-in-hissar-bg/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Dictionary learning in scikit-learn 0.9</title>
		<link>http://blog.vene.ro/2011/09/19/dictionary-learning-in-scikit-learn-0-9/</link>
		<comments>http://blog.vene.ro/2011/09/19/dictionary-learning-in-scikit-learn-0-9/#comments</comments>
		<pubDate>Mon, 19 Sep 2011 17:15:16 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[dictionary learning]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://venefrombucharest.wordpress.com/?p=229</guid>
		<description><![CDATA[Thanks to Olivier, Gaël and Alex, who reviewed the code heavily the last couple of days, and with apologies for my lack of activity during a sequence of conferences, Dictionary learning has officially been merged into scikit-learn master, and just in time for the new scikit-learn 0.9 release. Here are some glimpses of the examples...  <a href="http://blog.vene.ro/2011/09/19/dictionary-learning-in-scikit-learn-0-9/" title="Read Dictionary learning in scikit-learn 0.9">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Thanks to Olivier, Gaël and Alex, who reviewed the code heavily the last couple of days, and with apologies for my lack of activity during a sequence of conferences, Dictionary learning has officially been merged into scikit-learn master, and just in time for the new scikit-learn 0.9 release. Here are some glimpses of the examples you can run for yourself:</p>
<a href="http://blog.vene.ro/wp-content/uploads/2011/09/plot_image_denoising_1.png"><img src="http://blog.vene.ro/wp-content/uploads/2011/09/plot_image_denoising_1.png" alt="Dictionary learned from Lena patches" title="plot_image_denoising_1" width="420" height="400" class="aligncenter size-full wp-image-230" /></a>
<a href="http://blog.vene.ro/wp-content/uploads/2011/09/plot_image_denoising_24.png"><img src="http://blog.vene.ro/wp-content/uploads/2011/09/plot_image_denoising_24.png" alt="Noisy image for denoising" title="plot_image_denoising_24" width="500" height="330" class="aligncenter size-full wp-image-232" /></a>
<a href="http://blog.vene.ro/wp-content/uploads/2011/09/plot_image_denoising_44.png"><img src="http://blog.vene.ro/wp-content/uploads/2011/09/plot_image_denoising_44.png" alt="Image denoising with Dictionary learning and Orthogonal matching pursuit" title="Image denoising with Dictionary learning and Orthogonal matching pursuit" width="500" height="330" class="aligncenter size-full wp-image-235" /></a>
<p>The stars of this new release are: the manifold learning module by Jake Vanderplas and Fabian Pedregosa, the Dirichlet process gaussian mixture model by Alexandre Passos, and many others, as you can see from the <a href="http://scikit-learn.sourceforge.net/dev/whats_new.html" title="scikit-learn development changelog">development changelog</a> (as soon as the release is made, I will update this post with permanent links).</p>
<p>The release is due tomorrow. I will also be in charge with building the Windows installers for this release, let&#8217;s hope I do a good job and you can think of me and smile when installing!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2011/09/19/dictionary-learning-in-scikit-learn-0-9/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Long overdue update. EuroScipy and SSLST 2011</title>
		<link>http://blog.vene.ro/2011/09/05/long-overdue-update-euroscipy-and-sslst-2011/</link>
		<comments>http://blog.vene.ro/2011/09/05/long-overdue-update-euroscipy-and-sslst-2011/#comments</comments>
		<pubDate>Sun, 04 Sep 2011 22:08:38 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://venefrombucharest.wordpress.com/?p=221</guid>
		<description><![CDATA[Anybody reading my blog should have expected me to blog about the end of my GSoC. Sorry to disappoint, but I simply did not experience anything similar to an ending. On the contrary, I feel like things have barely started. Also, I apologize for one of the few posts here without pretty pictures! For the...  <a href="http://blog.vene.ro/2011/09/05/long-overdue-update-euroscipy-and-sslst-2011/" title="Read Long overdue update. EuroScipy and SSLST 2011">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>Anybody reading my blog should have expected me to blog about the end of my GSoC. Sorry to disappoint, but I simply did not experience anything similar to an ending. On the contrary, I feel like things have barely started. Also, I apologize for one of the few posts here without pretty pictures! <img src='http://blog.vene.ro/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>For the last two weeks, I&#8217;ve been traveling. I attended the EuroScipy conference thanks to Fabian, who offered me a place to sleep during the week. We sprinted hard, we discussed tricky APIs, we drank a lot of coffee, beer, and ate well in lovely Paris. It was great to meet all of the celebrities, the people who keep the scientific Python globe turning.</p>
<p>Many thanks to Gael and Emmanuelle, who worked very, very hard on organizing everything, so they weren&#8217;t around and I didn&#8217;t get to say goodbye when I ran to catch my plane last Sunday.</p>
<p>I was in a hurry, heading to Tarragona, a beautiful city on the Catalan coast, where the public university organized the 2011 summer school in linguistics and speech technologies (SSLST). This was a great opportunity to meet many fellow young researchers working in computational linguistics. I will not go into details now, because I plan expand on this, but I would like to state a couple of things. Firsty, even though NLP seems to be mostly a Java-dominated affair (note for example <a href="http://nlp.stanford.edu/software/index.shtml">Stanford&#8217;s NLP toolkit</a> and <a href="http://gate.ac.uk/">Sheffield&#8217;s GATE</a>), the <a href="http://www.clips.ua.ac.be/">computational linguistics and psycholinguistics research center (CLiPS)</a> at the University of Antwerp actually briefly manifested its devotion to Python and NLTK via its research director, Walter Daelemans. </p>
<p>It was good to see a little love for Python in this field. NLTK is very underrepresented in the SciPy community, I couldn&#8217;t find anybody at the EuroScipy conference knowing too much about it or about the people behind it.</p>
<p>Another lab that has done a lot of cool work is <a href="http://www.uclouvain.be/cental">Cental</a> at the Catholic University of Louvain, and they also use Python for natural language processing. Maybe in the coming years, we will  see a Python for Computational Linguistics sattelite, along with Physics and Neuroscience. Doesn&#8217;t it sound more fun? <img src='http://blog.vene.ro/wp-includes/images/smilies/icon_razz.gif' alt=':P' class='wp-smiley' /> </p>
<p>Secondly, I wish SSLST were organized by someone like Gael! As the discussion at dinner regarding who will organize next year&#8217;s EuroScipy went, it is imperative that the organizers be actively involved in the community, and generally passionate about it. Even though I&#8217;m comparing apples and oranges, Carlos Martin-Vide behaved in this context like a old, tired, emotionless academic, not taking into account even lunch breaks for the whole group, not to mention any sort of getting together or even a group photo (which, alas, we were not able to take, apart from small groups.) They said it couldn&#8217;t be done. Of course it could, they just didn&#8217;t want it hard enough.</p>
<p>Finally, before signing off, I would like to announce that because the Romanian Ministry of Education failed to specify the allocated number of public positions for masters&#8217; programmes, the admission exam at the University of Bucharest will be delayed by a couple of weeks. Luckily, this will allow me to attend <a href="http://lml.bas.bg/ranlp2011/start3.php">RANLP 2011</a> in Hissar, Bulgaria a week from now, where I will present my poster entitled:<br />
&#8220;Can alternations be learned? A machine learning approach to Romanian verb conjugation&#8221; by Liviu P. Dinu, Emil Ionescu, Vlad Niculae and Octavia-Maria Sulea. See you in Hissar!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2011/09/05/long-overdue-update-euroscipy-and-sslst-2011/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Optimizing Orthogonal Matching Pursuit code in Numpy, part 2</title>
		<link>http://blog.vene.ro/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/</link>
		<comments>http://blog.vene.ro/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/#comments</comments>
		<pubDate>Thu, 11 Aug 2011 17:39:03 +0000</pubDate>
		<dc:creator>vene</dc:creator>
				<category><![CDATA[dictionary learning]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[blas]]></category>
		<category><![CDATA[efficient]]></category>
		<category><![CDATA[lapack]]></category>
		<category><![CDATA[numpy]]></category>
		<category><![CDATA[omp]]></category>
		<category><![CDATA[orthogonal matching pursuit]]></category>
		<category><![CDATA[potrs]]></category>
		<category><![CDATA[scipy]]></category>

		<guid isPermaLink="false">http://venefrombucharest.wordpress.com/?p=212</guid>
		<description><![CDATA[EDIT: There was a bug in the final version of the code presented here. It is fixed now, for its backstory, check out my blog post on it. When we last saw our hero, he was fighting with the dreaded implementation of least-angle regression, knowing full well that it was his destiny to be faster....  <a href="http://blog.vene.ro/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/" title="Read Optimizing Orthogonal Matching Pursuit code in Numpy, part 2">Read more &#187;</a>]]></description>
			<content:encoded><![CDATA[<p>EDIT: There was a bug in the final version of the code presented here. It is fixed now, for its backstory, check out <a href="http://venefrombucharest.wordpress.com/2011/11/18/the-nasty-bug-crawling-in-my-orthogonal-matching-pursuit-code/" title="The nasty bug crawling in my Orthogonal Matching Pursuit code">my blog post on it</a>.</p>
<p>When we last saw our hero, he was fighting with the dreaded implementation of least-angle regression, knowing full well that it was his destiny to be faster.</p>
<p>We had come up with a more robust implementation, catching malformed cases that would have broken the naive implementation, and also it was orders of magnitude faster than said implementation. However, our benchmark [<a href="#footnote-1">1</a>] showed that it was a couple of times slower than least-angle regression. </p>
<p>By poking around the <code>scikits.learn</code> codebase, I noticed that there is a triangular system solver in <code>scikits.learn.utils.arrayfuncs</code>. Unlike the <code>scipy.linalg</code> one, this one only works with lower triangular arrays, and it forcefully overwrites <code>b</code>. Even though if weren&#8217;t faster, it should still be used: <code>scikits.learn</code> aims to be as backwards-compatible with SciPy as possible, and <code>linalg.solve_triangular</code> was added in 0.9.0. Anyway, let&#8217;s just see whether it&#8217;s faster:</p>
<pre class="brush: python; title: ; notranslate">
In [1]: import numpy as np

In [2]: from scipy import linalg

In [3]: from scikits.learn.datasets import make_spd_matrix

In [4]: from scikits.learn.utils.arrayfuncs import solve_triangular

In [5]: G = make_spd_matrix(1000)

In [6]: L = linalg.cholesky(G, lower=True)

In [7]: x = np.random.randn(1000)

In [8]: y = x.copy()

In [9]: timeit solve_triangular(L, x)
100 loops, best of 3: 3.45 ms per loop

In [10]: timeit linalg.solve_triangular(L, y, lower=True, overwrite_b=True)
10 loops, best of 3: 134 ms per loop
</pre>
<p>Wow! That&#8217;s 40x faster. We&#8217;re catching two rabbits with one stone here, let&#8217;s do the change! Notice that we can just copy \( \mathbf{v}\) into the appropriate place in \( L\) and then solve in place.</p>
<p>But whoops! When solving the \( LL&#8217;\) system, we take advantage of the <code>transpose</code> attribute in <code>linalg.solve_triangular</code>, which the <code>scikits.learn</code> version does not expose. We could think of a solution, but here&#8217;s a better idea: Shouldn&#8217;t there be some way to directly solve the entire system in one go?</p>
<p>Well, there is. It is an LAPACK function by the name of <code>potrs</code>. If you are not aware, LAPACK is a Fortran library with solvers for various types of linear systems and eigenproblems. LAPACK along with BLAS (on which it is based) pretty much powers all the scientific computation that happens. BLAS is an API with multiple implementations dating from 1979, while LAPACK dates from 1992. If you ever used Matlab, this is what was called behind the scenes. SciPy, again, provides a high-level wrapper around this, the <code>linalg.cho_solve</code> function.</p>
<p>But SciPy also gives us the possibility to import functions directly from LAPACK, through the use of <code>linalg.lapack.get_lapack_funcs</code>. Let&#8217;s see how the low-level LAPACK function compares to the SciPy wrapper, for our use case:</p>
<pre class="brush: python; title: ; notranslate">
In [11]: x = np.random.randn(1000)

In [12]: y = x.copy()

In [13]: timeit linalg.cho_solve((L, True), x)
1 loops, best of 3: 95.4 ms per loop

In [14]: potrs, = linalg.lapack.get_lapack_funcs(('potrs',), (G,))

In [15]: potrs
Out[15]: &amp;lt;fortran object&amp;gt;

In [16]: timeit potrs(L, y)
100 loops, best of 3: 9.49 ms per loop
</pre>
<p>That&#8217;s 10 times faster! So now we found an obvious way to optimize the code:</p>
<pre class="brush: python; title: ; notranslate">
def cholesky_omp(X, y, n_nonzero_coefs, eps=None):
    min_float = np.finfo(X.dtype).eps
    potrs, = get_lapack_funcs(('potrs',), (X,))
    alpha = np.dot(X.T, y)
    residual = y
    n_active = 0
    idx = []

    max_features = X.shape[1] if eps is not None else n_nonzero_coefs
    L = np.empty((max_features, max_features), dtype=X.dtype)
    L[0, 0] = 1.

    while 1:
        lam = np.abs(np.dot(X.T, residual)).argmax()
        if lam &amp;lt; n_active or alpha[lam] ** 2 &amp;lt; min_float:
            # atom already selected or inner product too small
            warn(&amp;quot;Stopping early&amp;quot;)
            break
        if n_active &amp;gt; 0:
            # Updates the Cholesky decomposition of X' X
            L[n_active, :n_active] = np.dot(X[:, idx].T, X[:, lam]
            solve_triangular(L[:n_active, :n_active], L[n_active, :n_active])
            d = np.dot(L[n_active, :n_active].T, L[n_active, :n_active])
            if 1 - d &amp;lt;= min_float:  # selected atoms are dependent
                warn(&amp;quot;Stopping early&amp;quot;)
                break
            L[n_active, n_active] = np.sqrt(1 - d)
        idx.append(lam)
        # solve LL'x = y in two steps:
        gamma, _ = potrs(L[:n_active, :n_active], alpha[idx], lower=True,
                         overwrite_b=False)
        residual = y - np.dot(X[:, idx], gamma)
        if eps is not None and np.dot(residual.T, residual) &amp;lt;= eps:
            break
        elif n_active == max_features:
            break
    return gamma, idx
</pre>
<p>Woohoo! But we still lag behind. Now that we delegated the trickiest parts of the code to fast and reliable solvers, it&#8217;s time to use a profiler and see what the bottleneck is now. Python has excellent tools for this purpose. What solved the problem in this case was <code>line_profiler</code> [<a href="#footnote-2">2</a>]. There is a great article by Olivier Grisel here [2] regarding how to use these profilers. I&#8217;m just going to say that <code>line_profiler</code>&#8216;s output is very helpful, basically printing the time taken by each line of code next to that line.</p>
<p>Running the profiler on this code, we found that 58% of the time is spent on line 14, 20.5% on line 21, and 20.5% on line 32, with the rest being insignificant (<code>potrs</code> takes 0.1%!). The code is clearly dominated by the matrix multiplications. By running some more timings with IPython, I found that multiplying such column-wise views of the data as <code>X[:, idx]</code> is considerably slower then multiplying a contiguous array. The least-angle regression code in <code>scikits.learn</code> avoids this by swapping columns towards the front of the array as they are chosen, so we can replace <code>X[:, idx]</code> with <code>X[:, :n_active]</code>. The nice part is that if the array is stored in Fortran-contiguous order (ie. column contiguous order, as opposed to row contiguous order, as in C), swapping two columns is a very fast operation!. Let&#8217;s see some more benchmarks! </p>
<pre class="brush: python; title: ; notranslate">
In [17]: X = np.random.randn(5000, 5000)

In [18]: Y = X.copy('F')  # fortran-ordered

In [19]: a, b = 1000, 2500

In [20]: swap, = linalg.get_blas_funcs(('swap',), (X,))

In [21]: timeit X[:, a], X[:, b] = swap(X[:, a], X[:, b])
100 loops, best of 3: 6.29 ms per loop

In [22]: timeit Y[:, a], Y[:, b] = swap(Y[:, a], Y[:, b])
10000 loops, best of 3: 111 us per loop
</pre>
<p>We can see that using Fortran-order takes us from the order of miliseconds to the order of microseconds!</p>
<p>Side note: I almost fell into the trap of swapping columns the pythonic way. That doesn&#8217;t work:</p>
<pre class="brush: python; title: ; notranslate">
In [23]: X[:, a], X[:, b] = X[:, b], X[:, a]

In [24]: np.testing.assert_array_equal(X[:, a], X[:, b])

In [25]:
</pre>
<p>However this trick works great for swapping elements of one-dimensional arrays. </p>
<p>Another small optimization that we can do: I found that on my system, it&#8217;s slightly faster to compute the norm using the BLAS function <code>nrm2</code>. So by putting all of these together, we end up with the final version of our code:</p>
<pre class="brush: python; title: ; notranslate">
def cholesky_omp(X, y, n_nonzero_coefs, eps=None, overwrite_X=False):
    if not overwrite_X:
        X = X.copy('F')
    else:  # even if we are allowed to overwrite, still copy it if bad order
        X = np.asfortranarray(X)

    min_float = np.finfo(X.dtype).eps
    nrm2, swap = linalg.get_blas_funcs(('nrm2', 'swap'), (X,))
    potrs, = get_lapack_funcs(('potrs',), (X,))

    indices = range(len(Gram))  # keeping track of swapping
    alpha = np.dot(X.T, y)
    residual = y
    n_active = 0

    max_features = X.shape[1] if eps is not None else n_nonzero_coefs
    L = np.empty((max_features, max_features), dtype=X.dtype)
    L[0, 0] = 1.

    while True:
        lam = np.abs(np.dot(X.T, residual)).argmax()
        if lam &amp;lt; n_active or alpha[lam] ** 2 &amp;lt; min_float:
            # atom already selected or inner product too small
            warn(&amp;quot;Stopping early&amp;quot;)
            break
        if n_active &amp;gt; 0:
            # Updates the Cholesky decomposition of X' X
            L[n_active, :n_active] = np.dot(X[:, :n_active].T, X[:, lam])
            solve_triangular(L[:n_active, :n_active], L[n_active, :n_active])
            v = nrm2(L[n_active, :n_active]) ** 2
            if 1 - v &amp;lt;= min_float:  # selected atoms are dependent
                warn(&amp;quot;Stopping early&amp;quot;)
                break
            L[n_active, n_active] = np.sqrt(1 - v)
        X.T[n_active], X.T[lam] = swap(X.T[n_active], X.T[lam])
        alpha[n_active], alpha[lam] = alpha[lam], alpha[n_active]
        indices[n_active], indices[lam] = indices[lam], indices[n_active]
        n_active += 1
        # solves LL'x = y as a composition of two triangular systems
        gamma, _ = potrs(L[:n_active, :n_active], alpha[:n_active], lower=True,
                         overwrite_b=False)

        residual = y - np.dot(X[:, :n_active], gamma)
        if eps is not None and nrm2(residual) ** 2 &amp;lt;= eps:
            break
        elif n_active == max_features:
            break

    return gamma, indices[:n_active]
</pre>
<p>Now, the benchmark at [<a href="#footnote-1">1</a>] indicates victory over least-angle regression! I hope you have enjoyed this short tour. See you next time!</p>
<p>[<span id="footnote-1">1</span>] <a href="https://github.com/scikit-learn/scikit-learn/blob/master/benchmarks/bench_plot_omp_lars.py" title="Orthogonal matching pursuit versus least-angle regression">OMP vs. Lars benchmark</a><br />
[<span id="footnote-2">2</span>] <a href="http://scikit-learn.sourceforge.net/dev/developers/performance.html#profiling-python-code" title="Profiling Python code">Profiling Python code</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.vene.ro/2011/08/11/optimizing-orthogonal-matching-pursuit-code-in-numpy-part-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

