November 12, 2019

How to check all the broken links in a web page using Selenium C# (.Net)

  November 12, 2019
To check if all the links inside a web page is working fine or not, at first we need to get all the links in inside the web page.
To do this , first extract all the elements with anchor tag (findElements(By.tagName("a")) and return it as a List.

After you got all the link type elements in the web page. In order to send a http request to a link , we need to get "href" property (URL) of each element and get it to a String List.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
        public List<string> GetAllLinks()
        {

            string currentURL = driver.Url.Split('.')[0];

            List<string> LinksList = new List<string>();
            IList<IWebElement> LinkElements = driver.FindElements(By.TagName("a"));

            foreach (IWebElement item in LinkElements)
            {

                string getURL = item.GetAttribute("href");
                try
                {
                    if (getURL.StartsWith(currentURL))
                    {
                        LinksList.Add(getURL);
                    }
                }
                catch (Exception e)
                {
                    Console.WriteLine(e.Message);
                }
            }

            return LinksList;

        }

As next step we need to send http request to each of these URL's and verify response to identify the broken links.

We can pass same URL's extra we created to below method to achieve this.

You can see i have used method InitiateSSLTrust(); before sending the http request, this will bypass the SSL certifcate issue while execution .


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
        public string GetHttpStatus(string url)
        {
            try
            {
                InitiateSSLTrust();
     
                HttpWebRequest webReq;
                webReq = (HttpWebRequest)WebRequest.Create(url);
                webReq.UseDefaultCredentials = true;
                webReq.UserAgent = "Link Checker";
                webReq.Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials;
                HttpWebResponse response = (HttpWebResponse)webReq.GetResponse();

                return response.StatusCode.ToString();

            }

            catch (Exception e)
            {
                return e.Message;
            }

        }

        public static void InitiateSSLTrust()
        {
            try
            {
                ServicePointManager.ServerCertificateValidationCallback =
                    new RemoteCertificateValidationCallback(
                        delegate (object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors errors)
                        {
                            return true;
                        });
            }
            catch (Exception)
            {
                 // ActivityLog.InsertSyncActivity(ex);
            }
        }

This execution can be made faster usingmulti threading  in C#.This can be made possible using Parallel.ForEach  property of C#.

You can use below method to implement this .


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public void CheckAllURLS(List<string> arrFoundLinks)
        {

            //var options = new ParallelOptions
            //{
            //    MaxDegreeOfParallelism = 300
            //   // MaxDegreeOfParallelism = Environment.ProcessorCount
            //};

            Parallel.ForEach(arrFoundLinks, url =>
            {
                var UrlStatus = GetHttpStatus(url).ToString();

                if (UrlStatus == "OK")
                {
                    
                    System.Diagnostics.Debug.WriteLine(url + " Status : " + UrlStatus + "--->>>> Link is valid");
                }
                else
                {
                    
                    System.Diagnostics.Debug.WriteLine(url + " Status : " + UrlStatus + "--->>>> Link is not valid");
                }

            }
            );
   
            }


If you come across any SSL certificate issue , go through the following link to find the fix
Disable SSL verification while using HttpsURLConnection - C#
logoblog

Thanks for reading How to check all the broken links in a web page using Selenium C# (.Net)

Previous
« Prev Post

No comments:

Post a Comment

Fixing javascript error: Cannot read properties of null (reading 'querySelector') issue when using CSS identifier in selenium

 JavaScript is another way to interact with web elements when normal selenium methods fail to act. But one issue with javascript is, it does...