No. 90: 🤖 Github Copilot

+ 🐙🔮 Clark the Octopus, TikTok soccer streams, NY Phil data

🐙 Remember Paul the Octopus? 🔮

He was the soothsaying cephalopod who foresaw the results of the 2010 World Cup. They’d give him two boxes of food, each with a flag on it, and whichever one he ate from was his “prediction”. He kept getting it right, right up to the final.

Google made this Doodle of him for the 2014 World Cup:

Paul lived as fine a life as any octopus could wish for, rivalled in luxury only by Squiddly Diddly.

The Spanish state offered Paul protection, his zoo in Germany made a statue of him, and a group of businessmen in Galicia offered a €30,000 transfer fee to make him the headline act at the Fiesta del Pulpo. You have to wonder how many live octopuses feature at this fiesta, don’t you? But people made exceptions for Paul; he earned eternal respect through his prophecies.

Sure, some said Paul was just lucky. And yeah, the President of Iran called him “a symbol of Western decadence and decay”, but the avant-garde will always threaten the status quo.

🤨 We revere those who can see the future. They hold a special place in the collective imagination. They are justifiably rewarded with enough riches to make Croesus blush.

Where I’m going with this is, this octopus got what was coming to him. TV appearances, all the clams you could ever eat, immortality in cartoon form.

And with that, let us recall Friday’s special edition of hi, tech. and a prediction even Paul would approve:

Heck, we even picked out Italy as the team to back before a ball was kicked.

If Google would like some inspiration for my upcoming Doodle, I offer this picture taken in my home last night:

Hope you all enjoyed a highly entertaining Euro 2020 final, and that you made some serious bones (money) from the Italy predictions.

Commiserations to England. They’ll have a strong chance of winning next year’s World Cup. The racist abuse of the players who missed the penalties is an absolute disgrace, too.

🔮 One more prediction: I’ll give the predictions a rest for a while. The chances of getting this lucky again are so slim, even Paul would steer clear.

Share hi, tech.

👨‍✈️ Github Copilot: What You Need to Know

Now, let’s take a look at Github Copilot.

I chatted with Chip Huyen, lecturer in machine learning at Stanford University, to get the inside track. Chip produces a lot of fantastic content and is very much worth following on LinkedIn here, and/or Twitter here. Her Machine Learning Interviews Book is superb, too.

What is Github Copilot?

Github (owned by Microsoft) and OpenAI have partnered to create Copilot, an AI-powered pair programmer. Microsoft invested $1 billion in OpenAI last year, too.

Github Copilot is powered by OpenAI’s Codex, which translates natural language into code.

Copilot is embedded in the Visual Studio Code editor (in beta for now), and it makes automated suggestions to programmers as they type. In the example below, the user provides a natural language prompt for context and begins typing code, before Copilot steps in to provide recommendations.

So this gets us a little closer to what this product really does. Think of the autocomplete suggestions you see in email software and you’re on the right track. Except that this is a much more complex problem to solve, with deeper implications if the solutions are flawed. Copilot also aims to go further than autocomplete, by inferring context from code to suggest altogether new functions.

It looks like this:

The idea of AI-assisted code is not new (hey, is any idea completely new?), but with Microsoft behind it Copilot has caused quite a stir. There are over 1250 comments on this thread about Copilot on Hacker News, for example.

Copilot: The experience

There is a waiting list to trial it and you better believe this joker is nowhere near the top of that list. I’d pencil me in for a tenative start date of 2025.

So I asked a real expert, Chip Huyen, about the experience of using Copilot. She says there are potential uses for Copilot beyond the obvious ones, for example with help writing documents. Codex’s natural language technology could prove its worth here.

Huyen also notes that the UI has room for improvement in its current form. “It’s very distracting to have code suggestions jump out at you when you write code”, she adds.

From what I have seen of the product in the wild, programmers can provide a short prompt and then Copilot provides suggested blocks of code. The programmer can scroll through these to find one that works. It’s easy to see how this could be distracting.

How it works

Copilot generates blocks of code, some of which have never been seen before. How does it do this?

Well, it is trained on millions of Github repositories, along with other open-source data.

In the FAQs on the Copilot site, they do attempt to address this directly:

They say that about 0.1% of the time the code will be taken verbatim from the training set. The other 99.9% of the time, it is using that training set as the basis either to synthesise different pieces of code or generate something new.

Some programmers would say the boundary between those two camps is not as fixed as Github would like you to believe. Ultimately, it is dependent on a training data set pulled from work by millions of real-life, human professionals.

I asked Chip Huyen about this and she said,

“It doesn't seem like the training process was too sophisticated and it looks like the model just overfits to training data (spitting out exact code from the training data), so there's risk of injecting bad/malware code into your product.”

Which brings us to our final question: Will Copilot replace human programmers?

And the answer seems like a hard no. It will certainly improve, both in its UI and its effectiveness, but this is intended as an assistant and it relies on a programmer’s expertise. Their job is about a whole lot more than typing out repetitive blocks of code, too.

The intellectual property questions are intriguing and Microsoft is adamant that the training data falls under the “fair use” category. There is precedent here; for example, when Google Books successfully argued a similar case.

For now, that’s what you need to know about Github Copilot!


📺 Social Streaming

When’s the last time you went on the Burnley Football Club website?

Actually: When’s the first time you went on the Burnley Football Club website?

For me, the two answers are the same: Yesterday.

But went there I did, all in the name of this announcement:

“Burnley FC Women will become the first team to have their games streamed live on TikTok – which will also become the team’s sleeve sponsor – as it undergoes significant expansion as part of the club’s new women’s football strategy.”

Amazon already streams sports and there have been rumours about Facebook buying football rights, but this is still a striking moment. For one, it’s a women’s team that is taking the lead by showing games on TikTok.

Also, it is a sign of what is to come on TikTok - don’t count them out of the so-called “streaming wars”.

📈 Exploratory Data Analysis of the week

I found a data set with every performance by the New York Philharmonic and I do miss going to see those guys, so I took a look.

The New York Philharmonic played its first concert on December 7, 1842. There’s a lot of data to look at, is what I’m saying.

For a first glance, I pulled out the most popular composers from 1842-present and not so surprisingly, Ludwig van takes the crown.

I’ve started looking at how they have diversified over the years too, in terms of composers but also venues.

Share hi, tech.

😔 The Data Hater

From the sublime to the ridiculous. We chew all the meats of our data-driven stew here at hi, tech.

I love Penguin and they make a fantastic point here - not that the visualisation would help you realise it:

🤖 Tech Bites

This week, I’m sharing some articles about the state of the online world.

  • Social media is broken: Here’s how we can fix it - MIT

“Social media is rewiring the central nervous system of humanity in real time,” said MIT Sloan professor Sinan Aral, who led the event. “We’re now at a crossroads between its promise and its peril.”

“The glue that holds humanity’s knowledge together is coming undone.”

  • The AI Wolf That Preferred Suicide Over Eating Sheep - OneZero

“An unemotional response from the AI of a simple game sparked off an emotional response among Chinese netizens, causing the kamikaze wolf anecdote to spread throughout various local social media platforms.”

Lots of excellent research in this guide from Conductor, based on international search data.


☝️ And finally…

It’s back! The Pudding’s AI New Yorker caption competition - Pudding

My dream is to win the New Yorker cartoon caption competition and this AI is aiming for the same.

This is likely to be of greater significance to me than Github Copilot, if we’re being honest.

See you next time!