I’m trying to use sed to clean up lines of URLs to extract just the domain.
So from:
http://www.suepearson.co.uk/product/174/71/3816/
I want:
http://www.suepearson.co.uk/
(either with or without the trailing slash, it doesn’t matter)
I have tried:
sed 's|\(http:\/\/.*?\/\).*|\1|'
and (escaping the non-greedy quantifier)
sed 's|\(http:\/\/.*\?\/\).*|\1|'
but I can not seem to get the non-greedy quantifier (?
) to work, so it always ends up matching the whole string.
27 Answers
In this specific case, you can get the job done without using a non-greedy regex.
Try this non-greedy regex [^/]*
instead of .*?
:
sed 's|\(http://[^/]*/\).*|\1|g'