Yesterday during faceit session one of my teammates had speakers on so I could hear myself. There would be nothing concerning about it if weren't for the fact, that I could hear what I say after 2 seconds delay. Assuming that mine and his ping were about ~30 ms, at best, the total travel time of my voice and back shouldn't be more than 200 ms. Voice chat in CS2 uses UDP protocol according to the informations I found.
That's the simplified path of my voice.
voice generated

server

voice received and sended back at the same time + input lag

server

voice received
Do a test with your friend on a faceit server or any other that is not located in your vicinity. Record a sound track with OBS and analyse the time between microphone input and your voice received. If the time is much higher than expected, then the answer lies in the quality of your internet connection.
Good luck