I have always had a question: do I need to scale my adata before running sc.tl.score_genes?

In my opinion, no.
If you check the function here. The mean is calculated on X, which is the raw counts.