Whitepaper: Why Attention Functions make ideas flourish
📈

Whitepaper: Why Attention Functions make ideas flourish

Menu

How to solve problems

Fiction

Other Projects

DEEPWAVE

You can comment on everything. Or give me (harsh) personal feedback.

Made with 💙 by me (how to).

Dedicated to my family.

How to solve problems

Fiction

Other Projects

DEEPWAVE

You can comment on everything. Or give me (harsh) personal feedback.

Made with 💙 by me (how to).

Dedicated to my family.

Status: Draft This is a first version of a whitepaper proposing a new file format: .aimm (Attention Instructing Memes). Aimm combines simple Markdown text blocks with a way to program in a varying probability of their importance

Introduction

Right now, you the reader could be spending your attention on many different things. But you are reading this, engaging with it’s ideas, reflecting on them, criticizing them or just passively skimming. In this paper I am laying out a vision for how this process could become more structured, allowing any agent to program their attention.

Objects of Attention

Let’s take any given moment tt in the life of an agent. What will be the object of attention OoAtOoA_t at that time? We can abstract this down to a set of experiences I ⁣EAgent{\rm I\!E}_{\text{Agent}} (also called environment henceforth), which are both its external stimuli as well as internal representations of older stimuli or combinations thereof (memories, thoughts, imaginations). Experience here refers to a subset of all possible inputs (also called Observations\text{Observations} in the machine learning literature) and their representation (whether or not that is qualia).

OaAtI ⁣EAgentI ⁣EAgent=I ⁣EInternal+I ⁣EExternalI ⁣EAll possibleOaA_t \subset {\rm I\!E}_{\text{Agent}} \\ {\rm I\!E}_{\text{Agent}} = {\rm I\!E}_{\text{Internal}} + {\rm I\!E}_{\text{External}} \subset {\rm I\!E}_{\text{All possible}}

Now internal experiences are often triggered by external events, and it is unclear to me what one would experience without any inputs. This paper therefore focuses mostly on looking at how the environment of an agent changes it’s set of experiences.

This definition allows to ask multiple fundamental questions. First, one can compare the richness RER_E of one environment to another. RER_E is just the size of the set I ⁣EAgent{\rm I\!E}_{\text{Agent}}. A white room has relatively few possible things one can spend ones attention to, while the biggest increase in RER_E has come through the invention of the internet. There is an interesting question here of whether or not the increase inRER_E over the course of history has been correlational or if it could be a measure of civilizational progress. But for this paper it suffices to say that RER_E is extremely large:

I ⁣EAgent1|{\rm I\!E}_{\text{Agent}}| ≫ 1

Now, one fact that has been consistently reported by meditation practitioners is that OoAt=1|OoA_t| = 1, which means humans can only focus on one thing at a time. While I personally believe it is more complicated than this, for an arbitrarily large definition of “experience”, this is most likely true, possibly even for all agents. It definitely suffices for the current argument.

This leaves agents with a core problem tough. If

(OaAt=1)(I ⁣EAgent1)(OaAtI ⁣EAgent)(|OaA_t| = 1) \cap (|{\rm I\!E}_{\text{Agent}}| ≫ 1) \cap (OaA_t \subset {\rm I\!E}_{\text{Agent}})

which experience should the agent focus on?

Attention is a weighing over experiences

Let’s suppose our agent has a goal GG, which is a state of the environment SGS_G it prefers over all other states SallS_\text{all}. It doesn’t have access to these states, all it can do is observe it. So a goal is the question: which actions should I take at point tt, given that I want to observe OoAG,t+nOoA_{G, {t+n}}. Given infinite compute the agent could just do backcasting and ask what is OoAG,t+n1OoA_{G, {t+n-1}} till t+n=tt + n = t. But a more tractable approach would be to use probabilities.

The core variable I will introduce in this paper is the Attention Probability ææ:

æi,t=P(OoAi,tOoAG,t+n)=P({I ⁣EAgent}iSG)æ_{i,t} = P(OoA_{i,t} | OoA_{G, t+n}) = P(\{{\rm I\!E}_{\text{Agent}}\}_i|S_G)

with æi=1\sum æ_i = 1.

Or in plain language: ææ is the attention you should put on any possible experience, given what you care about. This reduces the hard question of what to pay attention to into three questions:

  1. What is my goal? (At what time in the future do I want to experience what?)
  2. How rich is my environment? Can I change it so that it includes only experiences that would lead me to my goal?
  3. How will the attention function of different experience change over time? Do I need to do some things in order?

All three of these questions are still hard, but I will propose a solution that could make each a little more tractable.

Attention functions

In the last chapter we saw that what one should pay attention to depends on three variables, the goal GG, the environment I ⁣EAgent{\rm I\!E}_{\text{Agent}} and time tt.

Let’s look at the simplest case tt first.

Time dependent attention functions in the wild

Let’s suppose our agent is a human with a simple goal: the human wants to meet with his friend in the evening. They also live on the same planet, so the environment contains that experience. How would the attention function for this context look like?

image

Or more explicitly it is mostly 0 during the day, (the friend isn’t there), should stay 1 the whole time they are together and then goes back to 0 if he leaves.

To be continued …

Memes that instruct attention replicate better

Attention Management as a convergent instrumental goal

Introducing an Attention Instructing file format

The motivating need for a universal attention instructing file format

The problem with recommendation silos

.aimm and .aiml

Summary