I am trying to run the below code in my asp.net MVC web-application, to get the URL of first image(<img>
) in the HTML string, using the code below
public static List<Uri> FetchLinksFromSource(string htmlSource)
{
List<Uri> links = new List<Uri>();
string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
foreach (Match m in matchesImgSrc)
{
string href = m.Groups[1].Value;
links.Add(new Uri(href)); //getting error here
}
return links;
}
but I am getting this error "Invalid URI: The format of the URI could not be determined" when I run this code, here is the image of the error
How is that not a valid URI format? and how can resolve it?
You are getting the error of Invalid URI because you are passing a string
here, which may or may not have complete URL with https
or http
(with protocol) string
string href = m.Groups[1].Value;
So you need to change the above code and use it like
public static List<Uri> FetchLinksFromSource(string htmlSource) { List<Uri> links = new List<Uri>(); string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>"; MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline); foreach (Match m in matchesImgSrc) { string href = m.Groups[1].Value; //get Request Context var request = HttpContext.Current.Request; // Get url scheme with domain name(Auhtority) https://foo.com Uri serverUri = new Uri(request.Url.Scheme+"://"+ request.Url.Authority); // get the relative uri (/test.html) Uri relativeUri = new Uri(href, UriKind.Relative); // Get complete URi details Uri fullUri = new Uri(serverUri, relativeUri); //Now add it in Link links.Add(fullUri); } return links; }
I have commented out the code, to explain its meaning, I hope this helps, thanks
You may need to put the protocol infront of your URI if you are passing a string like
string server = "www.myserver.com";
it may throw this error, so use proper string for URI
System.Uri uri = new Uri("http://"+"example.com");
OR
Use different URI constructor
// this works, because the protocol is included in the string
Uri serverUri = new Uri(yourServer);
// needs UriKind arg, or UriFormatException is thrown
Uri relativeUri = new Uri(YourServerRelativePath, UriKind.Relative);
// Uri(Uri, Uri) is the preferred constructor in this case
Uri fullUri = new Uri(serverUri, relativeUri);
Where yourServer = "https://example.com" and YourServerRelativePath="/category/some-url"
Subscribe to our weekly Newsletter & Keep getting latest article/questions in your inbox weekly