Non greedy (reluctant) regex matching in sed?

IT Nursery

May 5, 2022

I’m trying to use sed to clean up lines of URLs to extract just the domain.

So from:

http://www.suepearson.co.uk/product/174/71/3816/

I want:

http://www.suepearson.co.uk/

(either with or without the trailing slash, it doesn’t matter)

I have tried:

 sed 's|\(http:\/\/.*?\/\).*|\1|'

and (escaping the non-greedy quantifier)

sed 's|\(http:\/\/.*\?\/\).*|\1|'

but I can not seem to get the non-greedy quantifier (?) to work, so it always ends up matching the whole string.

27 Answers
27

In this specific case, you can get the job done without using a non-greedy regex.

Try this non-greedy regex [^/]* instead of .*?:

sed 's|\(http://[^/]*/\).*|\1|g'

Tags: greedy pcre regex regex-greedy sed