Regex for getting the host of a website

I ran into this problem where we wanted to strip away the url of different sites to its root url. On top of that I also wanted to remove anything before the domain name, for example for I only want from it.

I ended up solving this issue using regular expression:


Explaining from the last character of the regex to the first one:

  • $ is the end of string e.g. ''<-(here)
  • Within the () bracket there are two different matches e.g. (a|b) matches to a or b

    • [^.]{2,}: matches to anything like ‘com’, ‘uk’, ‘london’ and etc
    • [^.]{2,3}\.[^.]{2}: matches to anything like ‘’ or ‘’ and etc
  • [^.]*\. this matches against ‘domain.’ or ‘google.’

Some test domain urls are listed below, you can try it out on (make sure you click on JavaScript on the right hand side instead of pcre):