Posts Tagged ‘tutorial’

LDA resources

April 12, 2026

https://www.ibm.com/think/topics/latent-dirichlet-allocation What is Latent Dirichlet allocation | IBM

Blei, D. M., Ng, A. Y., & Jordan, M., I. (2003). Latent dirichlet allocation. https://jmlr.csail.mit.edu/papers/v3/blei03a.html https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

GeeksforGeeks. (2025, July 23). Topic modeling using Latent Dirichlet Allocation (LDA). GeeksforGeeks.
https://www.geeksforgeeks.org/nlp/topic-modeling-using-latent-dirichlet-allocation-lda/

Ganegedara, T. (2025, February 2). Intuitive Guide to Latent Dirichlet Allocation. Towards Data Science.
https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158/

some blogs
https://johaupt.github.io/blog/Topic_modeling_with_Gibbs_sampling_in_R.html https://agustinus.kristia.de/blog/lda-gibbs/

Programming Differential Privacy

April 5, 2026

https://programming-dp.com/

Some snippets I liked….

QT:{{”
implement a function to check whether a dataframe satisfies
k-Anonymity, we loop over the rows; for each row, we query the dataframe to see how many rows match its values for the
quasi-identifiers. If the number of rows in any group is less than , the dataframe does not satisfy -Anonymity for that value of , and we return False. Note that in this simple definition, we consider all columns to contain quasi-identifiers; to limit our check to a subset of all columns, we would need to replace the df.columnsexpression with something else.

A function which satisfies differential privacy is often called a mechanism. We say that a mechanism satisfies -differential privacy if for all neighboring datasets and , and all possible sets of outputs (where refers to “sets of outputs of the mechanism.”)

The parameter in the definition is called the privacy parameter or the privacy budget. provides a knob to tune the “amount of privacy” the definition provides. Small values of require to provide very similar outputs when given similar inputs, and therefore provide higher levels of privacy; large values of allow less similarity in the outputs, and therefore provide less privacy.

Note that is typically a randomized function, which has many possible outputs under the same input. Therefore, the probability distribution describing its outputs is not just a point distribution.

In the definition of ε-differential privacy, the probability is taken over the randomness of the algorithm itself—that is, over the internal randomness used by the privacy mechanism to produce an output.

The important implication of this definition is that ’s output will be pretty much the same, with or without the data of any specific individual. In other words, the randomness built into should be “enough” so that an observed output from will not reveal which of or was the input. Imagine that my data is present in but not in . If an adversary can’t determine which of or was the input to , then the adversary can’t tell whether or not my data was present in the input – let alone the contents of that data.

According to the Laplace mechanism, for a function which returns a number, the following definition of satisfies -differential privacy: …
The Sensitivity of a function is the amount ’s output changes when its input changes in a minimal way. Intuitively, for a simple function with one numeric input, we think of the scenario where the input increases or decreases (changes) by 1.

However, more generally, define it in terms of adjacent dataset inputs.

Two datasets are said to be adjacent if they differ in the data of exactly one individual. This could mean adding or removing a single row (in the add-remove model) or changing a single row (in the substitution model). This notion of adjacency defines the smallest possible difference between datasets, and it forms the basis for reasoning about privacy guarantees.

The global sensitivity of a function is then generally defined as the maximum amount its output can change between any input pair of adjacent datasets.

Sensitivity is a complex topic, and an integral part of designing differentially private algorithms; we will have much more to say about it later. For now, we will just point out that counting queries always have a sensitivity of 1: if a query counts the number of rows in the dataset with a particular property, and then we modify exactly one row of the dataset, then the query’s output can change by at most 1.

Thus we can achieve differential privacy for our example query by using the Laplace mechanism with sensitivity 1 and an of our choosing. For now, let’s pick . We can sample from the Laplace distribution using Numpy’s random.laplace.

sensitivity = 1
epsilon = 0.1
adult[adult[‘Age’] >= 40].shape[0] + np.random.laplace(loc=0, scale=sensitivity/epsilon)


Sequential Composition

The first major property of differential privacy is sequential composition [11, 12], which bounds the total privacy cost of releasing multiple results of differentially private mechanisms on the same input data. Formally, the sequential composition theorem for differential privacy says that:

Sequential composition is a vital property of differential privacy because it enables the design of algorithms that consult the data more than once. Sequential composition is also important when multiple separate analyses are performed on a single dataset, since it allows individuals to bound the total privacy cost they incur by
participating in all of these analyses. The bound on privacy cost given by sequential composition is an upper bound – the actual privacy cost of two particular differentially private releases may be smaller than this, but never larger.
“}}

Huntington disease – PubMed

March 29, 2026

https://pubmed.ncbi.nlm.nih.gov/27188817/

ASO v RNAi v siRNA

QT:{{”

how a specific toxic conformation might be favoured
within the expanded polyQ of monomeric HTT exon1
is unclear37,47. More-complex conformational effects in
monomeric HTT exon1 linked to polyQ repeat length
are formally possible but challenging to establish37,49. By
contrast, the widely reported ability of HTT exon1 to
readily form a variety of aggregated structures presents
an array of plausible candidates that might mediate toxicity (see below)37. This aggregation links Huntington
disease to other neurodegenerative diseases that feature
a protein aggregation component, including Alzheimer
disease, Parkinson disease, amyotrophic lateral sclerosis
and spongiform encephalopathies.

bind to HTT mRNA selectively and target it for degradation
by cellular mechanisms. When the agent is a short
interfering RNA (siRNA) or microRNA, the HTT
mRNA is degraded by cytoplasmic RNA-induced silencing
complex (RISC) — a process known as RNA interference
(RNAi). Alternatively, a single-stranded modified
DNA molecule or antisense oligonucleotide (ASO) can
be used to direct the transcript for degradation by
nuclear ribonuclease H.
“}}

Bates, G. P., Dorsey, R., Gusella, J. F., Hayden, M. R., Kay, C., Leavitt, B. R., Nance, M., Ross, C. A., Scahill, R. I., Wetzel, R., Wild, E. J., & Tabrizi, S. J. (2015). Huntington disease. Nature Reviews Disease Primers, 1(1), 15005.
https://doi.org/10.1038/nrdp.2015.5

from G search {{

Yes, amyloid fibrils in Huntington’s disease (HD) contain a specific protein—the mutated huntingtin (Htt) protein. These fibrils are formed specifically from the N-terminal exon 1 fragment of the mutant protein, which contains an expanded polyglutamine (polyQ) tract that forms the amyloid core.
….
Although they contain the mutant protein, the amyloid fibrils in HD are distinct from those in Alzheimer’s (A
) or Parkinson’s (
-synuclein) diseases.

}}

Transformer youtube video

March 6, 2026

youtube links to the transformer video that is helpful for
understanding the transformer:

https://www.youtube.com/watch?v=wjZofJX0v4M
https://www.youtube.com/watch?v=eMlx5fFNoYc

3Blue1Brown. (2024, April 1). Transformers, the tech behind LLMs | Deep Learning Chapter 5 [Video]. YouTube.
https://www.youtube.com/watch?v=wjZofJX0v4M

Attention in transformers, step-by-step | Deep Learning Chapter 6

LDA resources

February 28, 2026

Ganegedara, T. (2025, February 2). Intuitive Guide to Latent Dirichlet Allocation. Towards Data Science.
https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158/

some blogs
https://johaupt.github.io/blog/Topic_modeling_with_Gibbs_sampling_in_R.html https://agustinus.kristia.de/blog/lda-gibbs/

Concepts, estimation and interpretation of SNP-based heritability – Nature Genetics

February 22, 2026

See Box 1, viz:

QT:{{”

Box 1 Statistical model used in the GREML approach to estimate hS2NP The statistical model used by GREML can be described in its simplest form as y = Wu + e
where y is an n x 1 vector of standardized phenotypes with n equal to the sample size, W = {wij} is an n x m standardized SNP genotype matrix where m is the number of SNPs, u = {ui} is an m x 1 vector of the additive effects of all variants when fitted jointly in the model, u ~ N(0,Iσ2) with I being an identity matrix, u and e is a vector of residuals, e ~ N(0,Iσ2). An equivalent model is….

y=g+e
g ~ N(0,A…)
A=W W’

In practice, A is called the SNP-derived genetic (or genomic) relationship matrix (GRM) and is estimated from the SNP data. The estimate …from GREML can be described as the estimated variance explained by all the SNPs (mσu) or equivalently as the estimated genetic variance by contrasting the phenotypic similarity
between unrelated individuals to their SNP-derived genetic similarity “}}

https://www.nature.com/articles/ng.3941

Yang, J., Zeng, J., Goddard, M. E., Wray, N. R., & Visscher, P. M. (2017). Concepts, estimation and interpretation of SNP-based heritability. Nature Genetics, 49(9), 1304–1310.
https://doi.org/10.1038/ng.3941

MCB111 Mathematics in Biology

February 22, 2026

http://mcb111.org/w06/w06-lecture.html
Has some nice textbook downloads – e.g.
http://mcb111.org/w06/KollerFriedman.pdf

QT:{{”
There are many good books to learn about probabilistic models. “Probabilistic graphical models: principles and techniques” (by Koller & Friedman) is a comprehensive source about more general probabilistic models than the one we are going to study here.
“}}
Subset of chap 7 focuses on GMRF

Send ppt from our chat

February 22, 2026

good tutorial/textbook chaper/review paper on Poisson regression :

Below are a few readings that discuss how to fit a generalized linear mixed model.

1. Breslow & Clayton (1993), JASA, Approximate Inference in
Generalized Linear Mixed Models.
https://doi.org/10.2307/2290687
A classic statistical paper introducing the Laplace approximation and penalized quasi-likelihood for GLMMs

2. Bates (2011), Mixed models in R using the lme4 package Part 5: Generalized linear mixed models
https://lme4.r-forge.r-project.org/slides/2011-03-16-Amsterdam/5GLMMH.pdf 3. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of statistical software, 67, 1-48.
https://doi.org/10.18637/jss.v067.i01
4. Bates (2025), Computational methods for mixed models
https://cran.r-project.org/web/packages/lme4/vignettes/Theory.pdf

These are written by the authors of the lme4 package, discussing the details of how a mixed-effects model (more specifically, a generalized linear mixed-effects model) is trained using the PIRLS approach.

Master Equation Notes – Following Up on Your Questions

February 22, 2026

1. Paulsson (2005) – “Models of stochastic gene expression”
https://www.sciencedirect.com/science/article/abs/pii/S1571064505000138 Nice pedagogical review of the master equation framework – covers the conceptual foundations and different analytical approaches.

2. Shahrezaei & Swain (2008) – “Analytical distributions for stochastic gene expression”
https://pubmed.ncbi.nlm.nih.gov/18988743/
This one derives the exact analytical solutions to the master equation for protein/mRNA distributions.

The Paulsson paper is probably better as a tutorial.
However, it’s a bit difficult to connect to protein & mRNA.

Huntington disease | Nature Reviews Disease Primers

February 22, 2026

https://www.nature.com/articles/nrdp20155

nrdp20155.pdf

Bates, G. P., Dorsey, R., Gusella, J. F., Hayden, M. R., Kay, C., Leavitt, B. R., Nance, M., Ross, C. A., Scahill, R. I., Wetzel, R., Wild, E. J., & Tabrizi, S. J. (2015). Huntington disease. Nature Reviews Disease Primers, 1(1), 15005.
https://doi.org/10.1038/nrdp.2015.5