OpenAI Codex: First Impressions

ai programming

OpenAI Codex

OpenAI is an AI research and development company. You might have heard some buzz about one of its products: GPT-3. GPT-3 is a language model that can generate human-like text. It can be used for chatting, text auto-completion, text summarisation, grammar correction, translation, etc. ChatGPT Demo

Codex is a descendant of GPT-3, trained on natural language data and publicly available source-codes (e.g. from public GitHub repos). Codex translates a natural language prompt to code. It is the very model that powers GitHub Copilot — an AI pair-programmer (checkout the site for demos, it is fascinating). Codex Demo

OpenAI recently released an API to access Codex (in beta). The demos attached with the release were a cause for consternation. Codex is proficient in a dozen (programming) languages. It can be used for code generation, refactoring, autocompletion, transpilation (translating source-code b/w languages), code explanation, etc. To show off Codex, OpenAI recently organised a challenge.

The Challenge

The challenge was to solve a series of (five) programming puzzles in Python. The only twist — you can use Codex as a pair-programmer. It was a time-judged competition, with a temporal cap. Not surprisingly, Codex itself was a participant (not just as a helper). Challenge The problems were simple. ~830 "people" (Codex included) were able to solve all five of them. I had to solve the first two challenges manually (OpenAI server issues). "Had to" because it was a race against time (& top 500 win an OpenAI t-shirt). For the other three, however, I was able to call in the cavalry (it was pretty climactic).

The novel experience of watching an AI auto-generate code is amazing. Just type a docstring — describing the procedure — and watch the code develop. If you're an old-time programmer, you'll get the notion when you experience it.

Illustration

I've illustrated one problem statement where I used Codex to generate a solution.

1PROBLEM
2Parse the given Python source code and return the list of full-qualified paths for all imported symbols, sorted in ascending lexicographic order.
3
4CONSTRAINTS
5The input will not contain any wildcard imports (from ... import *).
6Ignore aliases (renamings): from x import y as z should be represented as x.y.
7
8LIBRARY SUGGESTION
9Consider using the ast module.
10
11EXAMPLES
12Input
13import os
14import concurrent.futures
15from os import path as renamed_path
16from typing import (
17List, Tuple
18)
19
20Output
21['concurrent.futures', 'os', 'os.path', 'typing.List', 'typing.Tuple']

Codex it!

I just formulated the docstring. Using the doc, imported libs and function signature, it generated an (almost) functional code: Challenge

Pretty impressive. After just one or two manual bug sweeps, the code passed all testcases! Final script:

1import ast
2from typing import List
3
4
5def parse_imports(code: str) -> List[str]:
6 """
7 Parse all the imports in the code using ast module.
8 Imports of the form 'from x import y' should be appended as 'x.y'.
9 Ignore any alias. Append each import type to a list
10 and return the sorted list.
11 """
12 symbols = []
13 for node in ast.walk(ast.parse(code)):
14 if isinstance(node, ast.Import):
15 for name in node.names:
16 symbols.append(name.name)
17 elif isinstance(node, ast.ImportFrom):
18 for name in node.names:
19 symbols.append(node.module + '.' + name.name)
20 print(code, symbols)
21 return sorted(symbols)
22
23
24# Examples
25print(parse_imports('import os'))
26print(parse_imports('import os\nfrom typing import List'))

Implications

Although it could not beat all its human counterparts, it ranked an impressive #96 on the leaderboard. In all fairness, the competition paradigm was many-to-some — everyone faced the same five problems. So, Codex will have a rich data of differentiated prompts for the same set of problems. It might give the AI a learning edge (in the case of concurrent active learning). Still, for competing against top-notch programmers, top 100 is quite a feat. I mean, contrast the statistics below (Codex vs Avg. player): Challenge

Does this mean I should start scouting career options? Can Codex just self-optimise and outcompete all programmers? I doubt it.

Okay, let us go first-principles. Codex trained on public code repos. Now the underlying framework of weights and biases is impossible to entertain. So let us take a spherical cow approach. The constituents of its dataset will probably form a logarithmic distribution. Majority of the train split comprised of overlapping, generic, non-complex solutions (like database CRUDs). Basis this (sensible) hypothesis, we can assert that 80% of its statistical prowess lies in solving small, low cognitive, well-defined, pure functions.

Building webpages, APIs, 3rd party integrations, database CRUDs, etc. — 80% of non-creative, repetitive tasks can probably be automated. Going by the Pareto principle, the rest 20% — non-generalisable, complex, abstract problems — that take up 80% of cognitive bandwidth, will survive. But this is good news. Codex will handle all the tedious tasks, while programmers focus on the most creative jobs.

Once a programmer knows what to build, the act of writing code can be thought of as (1) breaking a problem down into simpler problems, and (2) mapping those simple problems to existing code (libraries, APIs, or functions) that already exist. The latter activity is probably the least fun part of programming (and the highest barrier to entry), and it’s where OpenAI Codex excels most.

Individual Problem Comparison

ProblemTeamSolved In (Me)Solved In (Codex)
1Me50:0422:09
2Me15:1607:22
3Me, Codex18:2019:24
4Me, Codex10:4725:43
5Me, Codex11:4914:13

Conclusion

Software is eating the world. Apparently, at the expense of atoms. Yet, this asymmetrically entitled ecosystem is inaccessible to most; programming demands a logical constitution and tools like OpenAI Codex can loosen these constraints, dissolve the clique.

I think Codex gets close to what most of us really want from computers—we say what we want, and they do it. Programming languages are an artifact of computers not being able to actually understand us, and humans and computers relying on a lingua franca to understand each other.

— Sam Altman (@sama), X

For programmers, these tools act as excellent appendages. Programming languages are, by definition, means of communication with computers. Consider Codex to be a level-n programming language (with intermediate transpilation). One step closer towards symbiosis.