New language that learns

‣
Menu

You can comment on everything. Or give me (harsh) personal feedback.

Made with 💙 by me (how to).

Dedicated to my family.

Status: Draft This is an attempt at creating a new language. Inspired by .. a lot of things from Loglan, Assembly / Huffman / Stephan Wolfram, skill or tech trees (most recently and noticeably Factorio), Neal Stephensons Primer, my ideas from 10 years ago around 3D language (think hieroglyphs / emojis optimized for VR), colored equations, conversations with Will DePue, … I’m wondering if we can build a purely functional, relational visual representation of all knowledge with dynamically optimized and hierarchically collapsed naming.

Ok, that was a mouthful, let me try to explain.

Basically, I want to build this:

for every word or sentence out there. But such that it collapses all the branches you already know. Imagine reading a research paper and it would only contain the functional relations that are new and important to you.

Let’s say I send you a random paper: Catalytic Synthesis of Polyribonucleic Acid on Prebiotic Rock Glasses | Astrobiology (liebertpub.com)

Instead of a wall of text, you would see something like this:

This is an early manual version of the final idea I have, but notice some of the details:

The most important novel concept, the function of rock glasses on nucleotides is clearly seen in the center. This is the novel information I didn’t know before.
The motivation for understanding the question is clear by the context: rock glass (even though I don’t know what that is yet), existed on the early earth.
Nucleotides are not named, instead, it’s ATP, GTP, etc. because I personally remember that better. All those details are left out because I already know how triphosphates make RNA.
Instead, the graph focuses on the glass rocks and the assay (proof) of the function under question. Less important concepts are smaller, in a final implementation one could zoom in and learn everything about mafic intrusive rock and electrophoresis.

This graph focuses on the most important assemblies (a function of one concept to another concept) by being

a) embedded in a global graph of all knowledge, replacing citations and hyperlinks with input functions and their robustness, aka “weight of causality” (How much would the conclusions change if any of the inputs becomes less certain?)

b) embedded in your local graph of personal knowledge, replacing large chunks of the graph with abbreviations / shorthand / notation, if (and only if) those parts are well understood, and visualizing novelty (surprise) times importance (given a larger goal).

I think LLMs are now good enough that such a thing could be build, if you want to help, get in touch.

How do you explain “RNA”

The reason for why I think this would be helpful is that I and probably others already think like this. When we read a not-yet-understood text or try to learn something, we first identify what components are important to understand (in most contexts this is done by exposing ourselves a lot, going to a different country, advanced classes, watching dubs, etc. letting our brains build an actual histogram of attention-worthy things in the background). We then go back and click around bunch of hyperlinks / look stuff up (using our tacit histogram to guide our intuitive curiosity in the right direction). We’re basically trying to build a tech tree and then execute on it.

Let’s illustrate this with a recent example of mine: what is scRNA seq?

Well, it’s single-cell RNA sequencing.

Ok, what does single-celled mean in this context? I guess sequencing refers to reading a sequence. But what is RNA?

Let’s look at a few definitions and see what they offer.

The first one I found was:

ribonucleic acid : an important chemical present in all living cells

RNA | English meaning - Cambridge Dictionary

This is well … somewhat helpful. I call this the “XKCD Way”, from the Thing Explainer. In an important sense, this is the “fully expanded view”, it’s having done the recursive breakdown of each words definition until one get’s to some accepted list of known words. This is good, because you can understand what anything means somewhat, as the definition for what a sensible word even is, is that it can be “flattened” to a subset of all acceptable “axiomatic” words (usually experiences, simple relations, first 1000 English words). All words are on the tech tree, if we make the starting points large enough. The main problem is, that it tells us little: we are very far away in the tech tree, and can’t even build up to the word in question again. Notice how the text has links. In principle you could click your way through them and try to even get near RNA again, but that would be a challenge, probably even impossible (how do you get from “chemical” to RNA?).

So let’s try another definition, from Wiktionary:

1. (biochemistry) A derivative of DNA having ribose in place of deoxyribose, and uracil in place of thymine; its primary function is in the transcription of genetic material and subsequent synthesis of protein.

Now, this is basically the N-1 definiton, all our words are close to RNA on the tech tree (we even get a notice that we are on the biochemistry branch!).

But there is still some interesting problems if you expand them.

First, some words are needlessly hard, but undefined (like subsequent). But more interestingly, expanding transcription gives:

1. (genetics) The synthesis of RNA under the direction of DNA.

Note how all these words are already defined! Without trying, I already found a complex, but entirely circular definition. (There is also some logical confusion around genetic material: does RNA only transcribe genes or all parts of DNA?)

What do I mean by circular? It just puts some of the things that are defined by RNA (following in the tech tree) in the original definition. This is fine, but more clear if one uses an actual tech tree:

Or, based on the actual assembly (time):

There is some confusion created here (is DNA synthesized from RNA, or from scratch?), but that is more a curious question enabled by the visualization. In general, this feels way more expressive and clearer to me than the original sentence.

Some things to note:

Words are sometimes pointers to things, but more often they are pointers to functions (relations between things)

This means you can turn every word into a tech tree
Jargon is just collapsing some area of the tech tree and giving it a pointer

No jargon is equally helpful to two people: we all have different preexisting knowledge of the tree, and pointing at regions without making sure the other has mapped that region well is bad practice

Dictionaries (and human language) are often defined cyclically, where pointers reference themselves

This means it’s actually hard to build a recursive “tree-expander” for words, using something like Wiktionary. LLMs are needed to make sense of the messiness of human language.

How do you explain “RNA”