Here is a list of blog posts on interpretability.
https://www.neelnanda.io/mechanistic-interpretability/favourite-papers-old
https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite-1
https://transformer-circuits.pub/2021/framework/index.html