So I’ve been obsessed with CPPN’s ever since I saw this series of blogposts by Hardmaru:
One of the main reasons I loved this idea so much is that almost all machine learning that you see concerns itself with fixed output dimensions; at least for images. The cool thing about the CPPN is that it maps pixel coordinates, along with some configurably latent-vector \(\vec{z}\), to rgb values:
\[ \text{cppn}(x, y, \vec{z}) = (r,g,b) \]
This is cool because, there is a value defined for every point! So you can use these things to create arbitrarily-large pictures! Furthermore, for a given \(\vec{z}\) we can make higher-resolution images by evaluating the network over different widths.
At Silverpond we’ve put this idea to good use in our upcoming event at Melbourne Knowledge Week.
In any case, here I’d like to document my playing-around with the idea of using CPPNs to generate 3d landscapes.
I’ve put together some pieces of code here: cppn-3d. Thanks to the amazing MyBinder you can even run the notebook online, right now, and start generating your own cool images!
To use the Python code, say, take a look at the notebook and you’ll see something like this (after imports):
latent_dim = 9
TAXICAB = ft.partial(np.linalg.norm, axis=0, ord=1)
EUCLIDEAN = ft.partial(np.linalg.norm, axis=0, ord=2)
INF = ft.partial(np.linalg.norm, axis=0, ord=np.inf)
norms = []
c = Config( net_size = 20
, num_dense = 5
, latent_dim = latent_dim
, colours = 3
, input_size = 1 + 1 + len(norms) + latent_dim
, norms = norms
, activation_function = tf.nn.tanh
)
size = 512
width = size
height = size
m = build_model(c)
z = np.random.normal(0, 1, size=c.latent_dim)
sess.run(tf.global_variables_initializer())
yss = forward(sess, c, m, z, width, height)
ys = stitch_together(yss)
The magic here is that we can get quite different pictures by mucking around with the params: net_size
, num_dense
, norms
, activation_function
and basically just about anything!
The very simplistic idea I had was that we can generate images with nice smooth colours, then just map those colours to heights, and that’s the end of it! I did this in three.js and TensorFlow.js at first, with some terrible code:
It worked! You can also play with this live if you wish; it does a kind of cool animation, albeit kinda slowly.
Of course, what I really wanted was to get a feel for how “walkable” or “playable” the resulting map would be. So I found my way to Unity3D, and half-wrote half-googled a tiny script to load in the image as a height map:
using System.IO;
using UnityEngine;
public class TerrainHeight : MonoBehaviour {
public int height = 400;
public int width = 400;
public int depth = 200;
public string cppnImage = "/home/noon/dev/cppn-3d/python/multi-2.png";
void Start () {
Terrain terrain = GetComponent<Terrain>();
terrain.terrainData = GenerateTerrain(terrain.terrainData);
}
TerrainData GenerateTerrain (TerrainData data) {
data.size = new Vector3(width, depth, height);
data.SetHeights(0, 0, GenerateHeights());
return data;
}
public static Texture2D LoadPng (string filePath) {
byte[] data = File.ReadAllBytes(filePath);
Texture2D texture = new Texture2D(2, 2);
texture.LoadImage(data);
return texture;
}
float[,] GenerateHeights () {
float[,] heights = new float[width, height];
Texture2D image = LoadPng(cppnImage);
for (int x = 0; x < width; x++) {
for (int y = 0; y < height; y++) {
Color colour = image.GetPixel(x, y);
float height = colour.r
+ colour.g
+ colour.b;
heights[x, y] = height / 3;
}
}
return heights;
}
}
In Unity3D, you attach this script to a terrain, then when you run it, it will set that piece of terrain to have the given heights you want!
Looks alright! Obviously my general Unity skills need work, but at least it looks something like a landscape! Here’s a few more of the top view generate by a bunch of similarly produced images:
The images that generated these (not in order) are in the maps folder:
Anyway, I hope someone finds this useful! I hope I can play with this idea a bit more! I think there’s a lot of juice to squeeze here, in terms of using CPPNs to generate different levels of detail; to add much more detail to the Unity terrain by making decisions based on height (such as where water goes, where snow starts, etc). Furthermore, it would also be neat to auto-generate town locations, and just about everything! Then of course there’s all the details of the CPPN itself to play with; the layer structure, adding more variables, using different norms to highlight different regions of the resulting image; the mind boggles at the options!
I hope this demonstrates how fun CPPNs can be!
As an aside, early in the day I was experimenting with producing large tiled images.
The basic idea is conveyed here:
On the left I have a particular image that I’ve generated. I want to continue this image downwards by one tile. On the right is the same image with the next tile.
This idea was due to Gala (who works for Neighbourlytics): basically, given that we have the optimisation machinery at hand, why not just attempt to find a new image, from the network, whose border matches at the point we’re interested in.
Initially, my idea was that I could do this by optimising over the z vector \(\vec{z}\) only; i.e. leave all the other parameters of the network alone. This turned out not to work at all. I’m actually not quite sure why, because my experience with CPPNs is that if \(\vec{z}\) is large, then you can get a whole bunch of variation by modifying it. In any case, I tried it, and while it did manage to make some progress, it was never really particularly good.
When that approach didn’t work, I used the one that generated the tile connections from above: I just optimised with respect to the entire CPPN network.
There were a few problems with this approach, unfortunately:
The tiles it generated were less “interesting”: In the image above, the one of the left is made of 3 tiles. The top one is the starting one; note it’s complexity. The following two tiles are very low in interesting-ness, but the final one is actually not bad. This perhaps makes sense, as when the optimiser only has to match one colour, it can allow itself some richness in the other region.
It didn’t work when I tried to match up two boundaries:
In all these pictures, the bottom-right tile is very out of sync with it’s two neighbours. This could definitely be fixed “in post”, by simplying blending it, but it’s still slightly unsatisfying that I couldn’t solve this within the CPPN framework. One original idea I had was to solve it by using (something-like) the interpolation process you see in the live JS example. Namely, we can pick two vectors \(\vec{z_1}\) and \(\vec{z_2}\) and move smoothly between them. When you watch this animate, you can feel like there should be some smoothing operation that would let us draw out a long line in this fashion. I think the approach would be to take, slice-by-slice, new images from vectors \(\vec{z_{n+1}}\), and use the slices from them to produce a landscape. This feels slightly odd to me, but perhaps would be nice.
In the end, my realisation was that I can produce very large maps simply by increasing the richness in the CPPN: increasing the numbers of dense layers, and “net size” (units in the dense layers), and then just simply making a high-resolution version of the resulting image:
In many ways I think I’m still a bit unsatisfied by this approach. I think ultimately it would be nice to have a grid-layout map:
Where each block is controlled by some vector \(\vec{z_i}\), and those can be modified at will. This would definately be possible just by blending in some standard way between the particular \(\vec{z_i}\)-values, but I do still think there should be a CPPN-based solution. One idea Lyndon had was by directly constructing the image from the grid, and then encoding that back into the CPPN, then decoding it, to get the “closest match” that is still smooth between the borders. I think this might work, but here we don’t have an encoding network.
If you have any ideas along these lines, or find any of this useful, then I’d love to hear from you!
I found a cool plugin in Unity — The Terrain Toolkit — that lets me easily add textures, and I worked out how to add a water plane (you just find it in the standard assets, and drag on the “Prefab”, and resize it), so we can give the maps a more earthly look and feel:
So cool! (I also updated the code so you can more easily express richer layers in the CPPN, check out the Jupyter Notebook Generate Maps for more deets.)
]]>So this year I’ve finally started my long-dreamt-about fashion label where each design is produced by some kind of computer program: Noon on PAOM.
For the time being I’m playing around with the website Print All Over Me. I like it because you can, surprisingly, put your designs all over the clothes!
So far I’ve made two main things,
Links to buy:
This is the first thing I’ve made in this way, and it’s interesting to me because it combines many things I’m interested in.
It’s built using deep learning, Haskell and dance. Specifically, I used what’s referred to as a “Pose Network”, to watch a dance video, and infer from that video the poses that the dancer was in at the time. From there, I used a small Haskell program to take those poses, and lay them out in a colourful way.
These were inspired by some 80s-style retro imagery that I found one day. I also put in a bit of thought into how to get the graphics displayed in a nicely-randomised way, and via my friend Reuben came up with a scheme that I wrote about on the Silverpond blog: Low-Discrepency Sequences, Haskell, and T-Shirts!
In the end I decided to make a whole bunch of different items available, so hopefully there is something for everyone here. If there’s something on the PAOM catalogue that I haven’t created, or you’d like a custom colour scheme, then get in touch and I can make something for you! Also, if you do end up buying something, then , or tag it with #retrohaskell
or #nvds
. I’d love to see how it looks and any crazy colour combinations!
Below I’ve enumerated all the items in the store, so you can click on the thing you like to buy it if you wish! Hopefully you’ll be seeing more open source clothing from me in the future :)
Given that it’s the end of the year, I thought it would be nice to generate^{1} a list of all the ideas that we came up with in 2017:
Unsurprisingly, given my current job, most of the ideas center around deep learning in some capacity. At a glance, I’d say that at least half of the deep learning ideas I’ve had have already been completed (in almost all cases not by me!) so that’s kinda cool. Seems like the next most popular thing to have an idea about is fashion, which also probably won’t shock you.
Bonus!
Because this is the first time I’ve done this post, here are all the ideas since the beginning of time:
stack.yaml
for haskell should be parsed as a readme and run by jenkins directly?
sign
functionality that will help here.
stack
bob is friendly bob attacks susie type error: bob attacks susie is not consistent with bob is friendly
lsft
or something. whatever, as probs won’t want it everytime ls runs.)
It was a ‘simple’ matter of jq '[.[] | select (.created_at >="2016-01-01")]' all.json>this-year.json
where I got all.json
by combing the pages.json curl "https://api.github.com/repos/silky/ideas/issues?state=all&per_page=100&page=1" p1.json
with jq -s '[.[][]]' p*.json > all.json
(jq is just so easy! …)↩
For the final event of the Melbourne Creative AI meetup this year I thought we’d run a small, fun, competition - a deep learning dance smackdown.
The event details are on Meetup.
The basic idea is, build a program that (does something like):
The exact format is flexible; some options would be:
You may like to use:
The format on the night will simply be to show how your particular idea works. It could be interactive, it could be online, it could be a presentation, it could be embodied in an robot! I leave it to your imagination.
The prize on the night will be one of the Evolution of Dance T-Shirts:
If you want to participate, then just send me an email, or just come along on the night with something to show!
The Three Laws of Robotic Dancing (courtesy of andyg)
So a few days ago I put The Internet High 5 Machine! live, finally!
It’s been a few years in the making (it turns out the first commit I made was in 2015.)
It started as a fun idea I had, for a way to share successes with friends overseas. The idea would be that you both have a machine, and then you somehow control their machine to give them a High 5, physically.
I let the idea sit for a while, and picked it up again as I was near the end of my recent degree.
My approach was to build the website and necessary tools in Haskell, so that I had a concrete project with which to learn. I didn’t know Haskell very well at the time, so it seemed like a good opportunity.
I looked around at a few web frameworks, but I had previously played with Yesod to build super-reference, so I opted to go with that. I actually still quite like Yesod, even though there are some more innovative things (servant) that I think I will explore in the future.
I would say I generally quite enjoyed the Haskell/Yesod experience, but I did have to make a somewhat-inappropriate number of PRs to random projects to get it all pulled together. In any case, it at least demonstrated to me that Yesod is a stable foundation on which to build, and that the Haskell ecosystem is very friendly and quite good to work with.
One of the main ways I made progress initially was to look at other well-written Yesod projects, and for this purpose I found pi-base completely invaluable. Interestingly, their in the process of shifting to servant :)
I put together the core bit of “client-side” high5 code in Haskell as well, and that’s open source here. This serves as a reference implementation of what I’m referring to as the “High 5 Protocol”:
This is a simply a list of expected conversations that you can have over a websocket connection with the high5.cool website. If you integrate anything with this protocol, you can activate it via the High 5’ing action on the website.
One of the ideas that I can now completely check-off is that anyone can place a “High 5 Me” button on their website (as I’ve now done on mine). I made a setting in the website so that you can allow “anonymous” high 5s (those from people not yet registered with the site). In this way, anyone who appreciates anything you’ve done can give you feedback in a fun way!
There’s some badges and other such things in the “profile” section of the High 5 website.
If you have any feedback on the website, then send me an ; I’d love to hear from you! (Alternatively, just send me a High 5!)
For the time being, I’ll let the website site for a while and add a few features that are missing; and hopefully we’ll have some more things to announce in a few months!
]]>In my job I’m almost entirely surrounded by men.
I work as a Machine Learning Engineer at Silverpond. It’s definitely the best place I’ve ever worked, but there’s one area that we’re actively trying to improve: recruiting women into roles as software engineers and machine learning engineers.
In this post I’ll cover some ideas I’ve had along these lines.
When I’ve been involved in recruiting people, in recent history, it’s tended to be almost entirely by word of mouth. My process is:
This actually works very well, but only because I tend to go to a lot of events, and I meet a lot of people. The one key problem with this approach is: I mostly end up recommending men.
I think that the reason for this can be somewhat explain by various cognitive biases (notably, “Ingroup bias”), and so I’ve been thinking about how to address it.
I was pointed at the few articles along these lines, and I’ll summarise them here:
Avoiding the ‘merit trap’ - Chief Executive Women and Male Champions of Change
Beginning with Ourselves - Airbnb
How blind auditions help orchestras to eliminate gender bias - The Guardian, and How Companies Are Taking Unconscious Bias Out of the Hiring Equation - LinkedIn Talent Blog
Our experiences in elevating the representation of women in leadership - Male Champions of Change
Concretely, what I’m going to do in this regard — getting more diverse applicants into the top of our “hiring funnel” — is:
and during the interview process, I think the following ideas are useful to keep in mind:
This is kind of a Part-1 of my thoughts on this topic. I’ve had this post sitting in drafts for a while, so I thought I’d publish it. I expressed some related ideas in our blog. Let’s see how it progresses over the next few months.
]]>This post is the second of a three-part series on the paper An adaptive attack on Wiesner’s quantum money. The other parts are:
Now that we’re familiar with the Wiesner’s original scheme for quantum money, we can take a look at the “bomb testing” attack presented in the paper. The paper actually introduces two attacks, one more general than the other, but I’ll focus on the less general one; the bomb testing attack, because it’s more fun.
But before we see how it’s applied, let’s take a look at the bomb testing idea, because it’s very cool.
Clicking on a bomb will attempt to detonate it. In this way you can see if a bomb is a dud or not. | |||||||||
The task is: Can you determine a test that separates the live bombs from the dud ones?
Classically, one approach would be to simply attempt to detonate each bomb. If the bomb goes off, then it was a live bomb, and if it doesn’t, it wasn’t! Simple enough, but it leaves us with no good bombs to actually use, and is reasonably fraught with danger.
Quantumly, it turns out there is something we can do that actually lets us know if the bomb is live or not, and keeps the bomb un-detonated!
In this model, we suppose that our bombs are configured like so:
Live bomb: Has a single-photon detector. If a photon hits, it, it explodes. | Dud bomb: Does not have a photon detector. The photon will pass unchanged through this bomb, and the bomb itself does not explode. |
We then recall the operation of a Mach-Zender Interferometer.
Due to the release of (yet another) Python framework for quantum simulation ProjectQ, I was inspired to revisit the paper An adaptive attack on Wiesner’s quantum money from a few years back.
This post will form the first post of a three-part series on the paper, and the background necessary to understand the part of it I’ll cover. We’ll learn about:
Wiesner’s Quantum Money scheme is one of the earliest ideas in quantum computing. The fundamental idea is that (unknown) quantum states cannot be copied arbitrarily and hence make an “unforgable” form of money: it shouldn’t be possible to duplicate a “quantum” bill.
Money, in Wiesner’s scheme, is created by the bank. The bank holds a serial number and a secret “key” that the bank uses to verify each note that it hands out. When given a note, the bank can determine if it is valid by referring back to this secret key.
The important steps are:
Money generation: Done by the bank here; the bank creates money by randomly picking a quantum state, call it \(|\$\rangle\), and handing that state to the customer, and recording the state it generated by matching it to the serial number of the note.
Money verification: When someone wishes to spend money, the bank verifies it.
Let’s take a look in detail. We’ll first need to recall some standard quantum states:
\[ \begin{aligned} |0\rangle &= \left( \begin{array}{c} 1 \\ 0 \end{array} \right), \\ |1\rangle &= \left( \begin{array}{c} 0 \\ 1 \end{array} \right), \\ |+\rangle &= \frac{1}{\sqrt{2}} \left( |0\rangle + |1\rangle \right), \\ |-\rangle &= \frac{1}{\sqrt{2}} \left( |0\rangle - |1\rangle \right). \end{aligned} \]
The first two states, \(\{|0\rangle, |1\rangle\}\) form a basis^{1} – the computational basis – for a single qubit, and the second two, \(\{|+\rangle, |-\rangle\}\) form a different, orthogonal, basis – the Hadamard basis.
The crucial idea to understanding the scheme is that in quantum mechanics, measurement can irreversibly destroy a given quantum state, changing it to be a completely different one.
To measure a given state in quantum mechanics is to first fix a set of potential measurement outcomes, and then “look” at the given state, and see which one of these outcomes the state when in. Let’s see an example.
Suppose we have a single qubit in some unknown state:
\[ \begin{aligned} |\psi\rangle &= \alpha |0\rangle + \beta |1\rangle \end{aligned} \]
We can measure the qubit in either the computational basis or the Hadamard basis.
Computational basis. Noting that it is already express in terms of the computationl basis, if we measure it in the computational basis, the Born rule for measurement says that we will get the state \(|0\rangle\) with probabily \(|\alpha|^2\) and the state \(|1\rangle\) with probabily \(|\beta|^2\).
Hadamard basis: Note that \(|0\rangle = \frac{1}{\sqrt{2}} \left( |+\rangle + |-\rangle \right)\) and \(|1\rangle = \frac{1}{\sqrt{2}} \left(|+\rangle - |-\rangle \right)\) so we can re-write \(|\psi\rangle\) as \[ \begin{aligned} |\psi\rangle &= \frac{\alpha + \beta}{\sqrt{2}} |+\rangle + \frac{\alpha - \beta}{\sqrt{2}} |-\rangle \end{aligned} \] and so, again by the Born rule, we would achieve outcome \(|+\rangle\) with probability \(\left|\frac{\alpha + \beta}{\sqrt{2}}\right|^2\) and \(|-\rangle\) with probability \(\left|\frac{\alpha - \beta}{\sqrt{2}}\right|^2\).
The point to note here is that the final state is different depending on which basis we measured it in. Wiesner used this fact to build a (hopefully) unforgable form of money.
Money generation: To withdraw money from a bank in Wiesner’s scheme, the bank performs the following steps:
Money verification: To verify a given piece of money, the bank proceeds as follows:
Having the same money state returned, instead of a new one each time validation succeeds, is critical to the success of the forging approach of arXiv:1404.1507.
Let’s look at an example. Suppose we have withdrawn some money from the bank, and the state we’ve been given (but can’t see) is
\[ \begin{aligned} |\$\rangle &= |+100--\rangle. \end{aligned} \]
There are six qubits, and we can see the bases that each has been prepared in, but if we’re simply the customer we don’t know this information.
Our goal is to create a state \(|F\rangle\) that the bank will also verify as valid.
Noting that if we measure either \(|+\rangle\) or \(|-\rangle\) in the computational basis, we’ll get \(|0\rangle\) with 50% probability or \(|1\rangle\) with 50% probability, one approach is simply to build \(|F\rangle\) by the following technique:
In our example, we can see that this will work 50% of the time for the first qubit of \(|\$\rangle\), 100% of the time for the 2nd, 3rd and 4th qubits, and again 50% of the time for the last two qubits. So for this state, this approach will succeed with probability \(\left( \frac{1}{2} \right)^3 = \frac{1}{8}\). Pretty bad odds.
In an earlier paper, Molina Vidick and Watrous show that in a model where the attacker doesn’t interact with the bank after receiving the note, the best attack that one can mount results in a success probability of \(\left(\frac{3}{4}\right)^n\), where \(n\) is the number of qubits in the money state. This is better than my approach here, but if we set \(n\) to be of modest size, say \(n = 10\), then this approach will succeed at most 5 times out of 100; still not particularly good. If we had 100 twenty dollar notes, we could attempt to forge them, and we’d end up with a total of \(2 \times 5 \times 20 = \$200\) instead of the original \(\$2,000\) we started with.
We’ve seen that Wiesner’s original scheme for quantum money doesn’t appear to be forgable with our first ideas. In the next post we’ll learn about a very cool technique in quantum mechanics, the Elitzur-Vaidman bomb tester, and then we’ll see how it can be used to beat Wiesner’s scheme!
A one-qubit basis is a set of states such that any state involving one qubit can be written as a linear combination of either of the elements in the basis.↩
(This post requires a background in the basics of quantum computing (and neural networks). Please have a read of the first part of Introduction to quantum computing and the surface code if you’d like to get up to speed on the quantum parts, Neural networks and Deep Learning is a good introduction to the other part.)
Recently, I’ve been spending a lot of time thinking about machine learning, and in particular deep learning. But before that, I was mostly concerning myself with quantum computing, and specifically the algorithmic/theory side of quantum computing.
In the last few days there’s been a flurry of papers on quantum machine learning/quantum neural networks, and related topics. Infact, there’s been a fair bit of research in the last few years (see the Appendix at the end for a few links), and I thought I’d take this opportunity to have a look at what people are up to.
The papers we’ll be discussing are:
But first, let’s take a look at the paper that got me interested in machine learning in the first place!
The paper, Quantum algorithms for supervised and unsupervised machine learning by Lloyd, Mohseni and Rebentrost in 2013, was one of my first technical exposures to machine learning. It’s an interesting one because it demonstrates that for certain types of clustering algorithms there is a quantum algorithm that exhibits an exponential speed-up over the classical counterpart.
Aside: Gaining complexity-theoretic speed-ups is the central task of (quantum) complexity theory. The speedup in this paper is interested, but it “only” demonstrates a speed-up on a problem that is already known to be efficient^{1} for classical computers, so it doesn’t provide evidence that quantum computers are fundamentally more powerful than classical ones, by the standard notions in computer science.
The Supervised clustering
problem that is tackled in the paper is as follows:
Given some vector \(\vec{u} \in \mathbb{R}^N\), and two sets of vectors \(V\) and \(W\), then given \(M\) representative samples from \(V\): \(\vec{v}_j \in V\) and \(\vec{w}_k \in W\), figure out which set \(\vec{u}\) should go in to, by comparing the distances to these vectors.
In pictures it looks like so:
Classically, if we think about where we’d like to put \(\vec{u}\), we could compare the distance to all the points \(\vec{v_1}, \vec{v_2}, \vec{v_3}\) and to all the points \(\vec{w_1}, \vec{w_2},\vec{w_3}\). In the specific example I’ve drawn, doing so will show that, out of the two sets, \(\vec{u}\) belongs in \(V\).
In general, we can see that, using this approach, we would need to look at all \(M\) data points, and we’d need to compare each dimension of the \(N\) dimensions of each vector \(\vec{v_j}, \vec{w_k}, \vec{u}\); i.e. we’d need to look at at least \(M\times N\) pieces of information. In other words, we’d compute the distance
\begin{align*} d(\vec{u}, V) = \left| \vec{u} - \frac{1}{M}\sum_{j=1}^{M} \vec{v_j} \right| \end{align*}By looking at the problem slightly more formally, we find that classically the best known algorithm takes “something like”^{2} \(\texttt{pol}y(M\times N)\) steps, where “\(\texttt{poly}\)” means that the true running time is a polynomial of \(M\times N\).
Quantumly, the paper demonstrates an algorithm that lets us solve this problem in “something like” \(\log(M\times N)\) time steps. This is a significant improvement, in a practical sense. To get an idea, if we took \(100\) samples from a \(N = 350\)-dimensional space, then \(M\times N = 35,000\) and \(\log(M \times N) \approx 10\).
The quantum algorithm works by constructing a state in a certain state so that, when measured, the distance that we wanted, \(d(\vec{u}, V)\), is the probability that we achieve a certain measurement outcome. In this way, we can build this certain state, and measure it, several times, and use this information to approximate the required distances. And, the paper shows that this whole process can be done in “something like” a running time of \(\log(M\times N)\).
There’s more contirbutions in the paper than just this; so it’s worth a look if you’re interested.
So this paper is pretty cool. We can get a feel for what it’s doing by first considering the following network:
This network has two inputs, \(x_1, x_2\), three learnable weights, \(w_1, w_2, w_3\), one output value \(y\), and an activation function \(f\).
Classically one would feed in a series of training examples \((x_1, x_2, y)\) and update the weights according to some loss function to achieve the best result for the given data.
Quantumly, there are some immediate problems with doing this, if we switch the inputs \(x\) to be quantum states, instead of classical real variables.
The problems are:
The way this paper solves these problems is to transition Figure 2 from a classical non-reversible network to a reversible quantum one:
The final network takes in an arbitrary quantum state of two qubits, \(x_1, x_2\), and then adjoins an ancilla state \(|0\rangle\), applies some unitary operation \(U\), and emits a combined final state \(|\psi\rangle^{\text{Out}}_{x_1,x_2,y}\) where the final qubit \(y\) contains the result we’re interested in.
At this point, one might reasonably ask: How is this different to a quantum circuit? It appears to me that the only difference is that \(U\) is actually unknown, and it is trainable! Note that this is also a somewhat radical difference from classical neural networks: there, we don’t normally think of the activation functions (defined as \(f\) above) as trainable parameters; but quantumly, in this paper, that’s exactly how we think of them!
It turns out that unitary matrices can be parameterised by a collection of real variables \(\alpha\). Consider an arbitrary unitary matrix operating on two qubits, then \(U\) can be written as:
\begin{align*} U = \exp\left[ i \left( \sum_{j_1,j_2=0,0}^{3,3} \alpha_{j_1, j_2} \times \left(\sigma_{j_1} \otimes \sigma_{j_2}\right) \right) \right] \end{align*}where \(\sigma_i, i \in {1,2,3}\) are the usual Pauli matrices and \(\sigma_0\) is the \(2\times 2\) identity matrix. So one can then make these parameters \(\alpha_{j_1, j_2}\) the trainable parameters! It turns out that in the paper they don’t train these parameters explicitly, instead they pick a less general way of writing down unitary matricies, and they construct, by hand, a unitary for two qubits. It’s not clear why they’ve done this, and it would not be fun to have to build a special trainable unitary matrix for each node/neuron of your architecture depending on its input.
Update: Kwok-Ho kindly corrected me that they do indeed train directly on this form of unitary matricies, and that the simplification they do in the paper is used to investigate the loss surface.
In any case, the main contribution of this paper seems to me to be the idea that we can learn unitary matricies for our particular problem. They go on to demonstrate that this idea works to build a quantum autoencoder, and to make a neural network discover unitary matricies that perform the quantum teleportation protocol.
One view is that trying to learn arbitrary unitary matrices that perform a task really well will become too hard as the number neurons grows. If we had a large network, with potentially millions of internal, neurons (and hence unitaries) to learn, then it might be more effective to fix unitaries and instead focus on learning the weights.
However, it’s a promising technique that would be fun to try out.
Those of you familiar with neural networks will know that the central idea used to train them is Gradient Descent. We recall that gradient descent lets us known how to modify some vector \(x\) so that it does “better” when evaluated with some cost function \(C(x, y)\) where \(y\) is some known good answer. I.e. \(x\) might be a probability of liking some object, and \(y\) might be the true probability, and \(C(x,y) = |x-y|^2\).
The paper supposes we have some quantum state \(|x\rangle = \sum_{j=1}^N x_j |j\rangle\) (where \(|j\rangle\) is the \(j\)’th computational-basis state), and some cost function \(C(|x\rangle, |y\rangle)\) that tells us how good \(|x\rangle\) is. The question is, given we can evaluate \(C(|x\rangle, |y\rangle)\), how can we best work out to modify \(|x\rangle\) to do better?
If this was entirely classical, we could just calculate the gradient of \(C\) with respect to the variables \(x_j\), and then propose a new set of \(x_j\)’s. However, we can’t inspect all these values quantumly, so we need to do something else.
In the paper, they demonstrate an approach that requires a few copies of the current state \(|x^{(t)}\rangle\), but will produce a new state \(|x^{(t+1)}\rangle\) such that (with objective/loss function \(f\)):
\begin{align*} |x^{(t+1)}\rangle = |x^{(t)}\rangle - \eta |\nabla f\left(x^{(t)}\right)\rangle \end{align*}for some step size \(\eta\). That is, it’s a step in the (hopefully) right direction, as per normal gradient descent!
So one direction to take this paper would be to build a “fully quantum” neural network like so:
where we make the weights quantum states, and the weights are multiplied onto the inputs as a dot-product. This would require that the weight state is the same size as the input state; but that should be possible because we’re the ones building the network structure.
Update: The idea about multiplying weights in didn’t make any sense; a much more sensible idea would be to prepare something like \(|w_i\rangle\langle w_i|\) and enact this on the input \(|x_k\) and then apply the fixed unitary.
We could then not worry about learning unitary matrices, and analogously to standard neural networks, just pick some unitary \(U\) that “works well” in practice, maybe by just defining quantum analogues of some common activation functions, perhaps say the ReLU or ELU.
Overall I think that the quantum gradient descent algorithm should be useful for training neural networks, and maybe some cool things will come from it. There are some natural direct extensions of this work; namely to extend the implementation to the more practical variations.
This paper came out only a few days after the Wan et al paper that we covered above, that also discussed autoencoders, so I thought it was worth a glance to see if this team did things differently.
This paper again takes the approach of not concerning itself with weights and instead focuses on learning a good unitary matrix \(U\) with a specific cost function.
They take a different approach in how they build their unitaries. Here they have a “programmable” quantum circuit, where they consider the parameters defining this circuit as the ones that can be trained. Given that these parameters are classical, and loss function they calculate is classical, no special optimisation techniques are needed.
It appears that the building blocks are being put together to start doing some serious work on quantum machine learning/quantum deep networks. Google and Microsoft are already heavily investing in quantum computers, Google in particular has something it calls the “Quantum A.I. Lab”, and there are even independent quantum computer manufacturing groups.
It seems like there are lots of options on which way to direct efforts in the quantum ML world, and with these recent developments on quantum ML techniques, the time appears to be right to be getting into quantum deep learning!
More interesting quantum machine learning papers:
Here efficient means that the problem is in the complexity class caled \(\textbf{P}\). Problems that are efficient for quantum computers are in the complexity class \(\textbf{BQP}\). One of the main outstanding questions in the field is “Are quantum computers more powerful than classical ones?” and this can be phrased as comparing the class \(\textbf{P}\) and \(\textbf{BQP}\).↩
“Something like” x here is a very informal term for the more formal statement that the running time is \(O(x)\). See Big O Notation for more.↩
For the past few months at work we’ve been putting up a Chalkboard in front of the office with jokes on it.
Today marks the 50th joke, so to celebrate I’m writing up the complete list. Most of the jokes here were ones we made up without looking at the internet; but occasionally, in an effort to have two new jokes every day, we picked some classics.
Q: What kind of parade did the astronauts throw for the computers after the mission?
A: A Turing tape parade!
Q: Why was the maths book sad?
A: It had too many problems.
Q: What did the AI say to the category theorist?
A: Does not commute!
(from Andy Kitchen)
Q: How did the OR programmer solve a MIP while also eating?
A: By using a brunch and bound technique.
[“hip”, “hip”]
Q: What do measure theorists and programmers have in common?
A: They both enjoy continuous integration.
Old mathematicians never die, they just lose some of their functions.
Q: Why did the computer keep sneezing?
A: It had a virus.
Q: Why wasn’t the complex beer successful?
A: People had trouble ordering it!
Q: Why did the functional programmer return her TV?
A: Because it was immutable.
Q: What do ruby and librarians have in common?
A: They both have explicit return policies.
A shepherd was out in the field counting her sheep; she counted 96 but when she rounded them up she had 100.
(from Two Lost Boys)
Q: Why was the computer owner so successful at sheep husbandry?
A: She had excellent RAM.
Q: Why couldn’t the formal system complete it’s homework?
A: It was trying to be consistent.
404: Joke Not Found.
Q: How does a lumberjack mathematician cut down trees?
A: With her Axiom.
Q: Why did the programmer go to her bookshelf before leaving her house?
A: She needed to get her keys from the dictionary.
Q: Why don’t you want to fight an OR consultant?
A: They are experts at duals.
Q: What did the Linux system administer for the programmer’s head cold?
A: Sudo ephedrine
Q: How did the physicist fix her car when it was failing intermittently?
A: She used statistical mechanics!
Q: Why was the bad python programmer so rich?
A: Because everytime his code failed he got a raise.
Q: What do python programmers and event planners have in common?
A: They both like to decorate functions.
Q: Why is 0 the boss?
A: Because no other number can go above it!
Q: What did the mathematician say when they discovered a new prime number?
A: That’s odd.
Q: Why did the low-rank matrix go to the psychologist?
A: Because it was having an identity crisis!
Q: What is a floating point numbers favourite type of tennis?
A: Doubles!
Q: What does a blender and the Kalman filter hav in common?
A: They both perform a smoothing function!
Q: What is the mathematicians favourite kitchen item?
A: Derivasieve.
Q: Why don’t elephants use computers?
A: Scared of the mouse.
Q: Why was the OR consultant unwell?
A: She want on a benders.
Q: What is a statisticians favourite genre of music?
A: Drum and Bayes.
Q: What is a pet store operatores favourite state in a multiplayer game?
A: The Parrot optimal state.
Q: What is the enterprise java programmers favourite business book?
A: Scalaing up!
Q: What function is a tree hugger most concerned by?
A: \(\log(n)\).
Q: What is a garbologists favourite optimisation problem?
A: Bin packing.
Q: What is a choirs favourite design pattern?
A: The Singleton pattern!
Q: What do you call a mathematician that has lots of statues in her garden?
A: Polygnomial.
Q: What do fashion designers and Haskell programmers have in common?
A: They love pattern matching!
Q: How did the mathematician impress at the dance party?
A: By showing off her step function!
No joke provided; the Curry-Howard isomorphism allows us to generate a programming joke from the maths joke.
Q: Why was the mathematician unhappy when she turned 24?
A: She now had a lot of factors to consider.
Q: Why was the programmer so poor?
A: Syntax.
Q: Why was the ML programmer late to the conference?
A: She spent too much time in the “train” stage.
Q: What number is good value?
A: 241
Exercise: What number is best value?
Q: Why couldn’t the python programmer get into her house?
A: Key error.
Q: How did the programmer get out of the deep end of the pool?
A: She made a pull request!
Q: Why was the ML researcher tired of shopping?
A: She was overfitting.
Q: How did the programmer get to the bottom of the ocean?
A: By sub-routine!
]]>Q: How do you order citrus?
A: Use the real number lime.