Archive for the '–' Category
Butterfly Tessellation – YouTube
January 20, 2026Ribbons, Friezes and Mathematics | Mathematics Education
January 19, 2026Filming Locations for The Gilded Age, HBO
January 18, 2026The loom: programming patterns on a path to computing | Loomery
January 17, 2026Hulken Bag
January 16, 2026Here’s the link: https://hulken.com/
the link to the rolling bag… It also comes in a smaller size. It has truly saved my back.
math video
January 11, 20263Blue1Brown. (2020, August 19). Group theory, abstraction, and the 196,883-dimensional monster [Video]. YouTube.
https://www.youtube.com/watch?v=mH0oCDa74tE
RL Project
January 11, 2026some reading materials on RL:
https://arxiv.org/pdf/2412.05265
https://arxiv.org/abs/2412.05265
Murphy, K. (2024, December 6). Reinforcement Learning: An Overview. arXiv.org. https://arxiv.org/abs/2412.05265
QT:{{”
Reinforcement learning or RL is a class of methods for solving various kinds of sequential decision making
tasks. In such tasks, we want to design an agent that interacts with an external environment. The agent
maintains an internal state zt, which it passes to its policy π to choose an action at = π(zt). The environment
responds by sending back an observation ot+1, which the agent uses to update its internal state using the
state-update function zt+1 = SU(zt, at, ot+1). See Figure 1.1 for an illustration.
To simplify things, we often assume that the environment is also a Markovian process, which has internal
world state wt, from which the observations ot are derived. (This is called a POMDP — see Section 1.2.1).
We often simplify things even more by assuming that the observation ot reveals the hidden environment state;
in this case, we denote the internal agent state and external environment state by the same letter, namely
st = ot = wt = zt. (This is called an MDP — see Section 1.2.2). We discuss these assumptions in more detail
in Section 1.1.3.
RL is more complicated than supervised learning (e.g., training a classifier) or self-supervised learning
(e.g., training a language model), because this framework is very general: there are many assumptions we can
make about the environment and its observations ot, and many choices we can make about the form the
agent’s internal state zt and policy π, as well the ways to update these objects as we see more data. We
will study many different combinations in the rest of this document. The right choice ultimately depends on
which real-world application you are interested in solving.1 .”}}