Read and parse an html page - page 3

 
Carl Schreiber #:

No, no according to; https://en.wikipedia.org/wiki/Web_crawler:

I don't consider my manual mouse clicks to be a bot.

Wikipedia as a reference ? Are you serious ?

Anyway, it was just to inform you. You are free to ignore the reality, it seems the new paradigm nowadays.

 
Alain Verleyen #:

Wikipedia as a reference ? Are you serious ?

Why?

 

Here the definition form heise.de: https://www.heise-homepages.de/glossary/crawler/

Ein Crawler, auch bekannt als Webcrawler, Spider oder Suchmaschinenroboter, ist ein automatisiertes Programm oder Skript, das das Internet systematisch durchsucht, um Webseiten zu indizieren. Diese Programme navigieren autonom von Link zu Link und von Seite zu Seite, um Daten zu sammeln. Die gesammelten Informationen werden dann verwendet, um einen Index für Suchmaschinen zu erstellen, damit Benutzer relevante Ergebnisse für ihre Suchanfragen erhalten können.

A crawler, also known as a web crawler, spider or search engine robot, is an automated program or script that systematically searches the Internet to index websites. These programs navigate autonomously from link to link and page to page to collect data. The information collected is then used to create an index for search engines so that users can obtain relevant results for their search queries. (translated by deepl.com)

Again I don't consider my manual mouse clicks to be "an automated program or script".

Was ist ein Crawler? Webcrawler einfach erklärt
Was ist ein Crawler? Webcrawler einfach erklärt
  • 2024.03.26
  • ChatGPT
  • www.heise-homepages.de
Was ist ein Crawler? Ein Crawler, auch bekannt als Webcrawler, Spider oder Suchmaschinenroboter, ist ein automatisiertes Programm oder Skript, das das Internet systematisch durchsucht, um Webseiten zu indizieren. Diese Programme navigieren autonom von Link zu Link und von Seite zu Seite, um Daten zu sammeln. Die gesammelten Informationen werden...
 
Carl Schreiber #:

Here the definition form heise.de: https://www.heise-homepages.de/glossary/crawler/

Again I don't consider my manual mouse clicks to be "an automated program or script".

Again, what you consider is not what matters.

Believe it or not, manual crawling is a thing. Crawling is NOT only an automated process. Case closed on my side.

 
Carl Schreiber #:

Why?

ChatGPT about Wikipedia. I am finding it a good answer, but still too optimistic (I know dozens of topics providing biased or wrong informations on Wikipedia). I will not discuss further about it though, as it's off-topic for this forum.

Wikipedia is a useful resource for many purposes, but its reliability depends on how you use it and the context in which you're evaluating the information. Here are some key considerations:

Strengths of Wikipedia:

  1. Open Access: It's freely available to anyone with an internet connection.
  2. Wide Coverage: Wikipedia covers an extensive range of topics, often with summaries and introductions that are easy to understand.
  3. References: Many articles provide citations to primary and secondary sources, allowing you to verify the information or delve deeper.
  4. Collaborative Nature: Articles are reviewed and edited by a global community, which helps to correct errors over time.

Weaknesses of Wikipedia:

  1. Open-Editing Model: Because anyone can edit, information can sometimes be incorrect, biased, or incomplete.
  2. Vandalism and Bias: Articles can be subject to intentional misinformation or skewed perspectives, especially on controversial or highly debated topics.
  3. Uneven Quality: While some pages are extensively researched and cited, others lack references or have low-quality citations.
  4. Not a Primary Source: Wikipedia itself is a tertiary source, summarizing information from other references rather than conducting original research.

Best Practices for Using Wikipedia:

  • Start Here, Verify Elsewhere: Use Wikipedia to gain a general understanding, then check the cited references for verification and more detailed information.
  • Check the Sources: Look at the references provided in an article to determine the reliability of the information.
  • Use for Non-Critical Contexts: It's fine for informal research, background knowledge, or as a jumping-off point but less ideal for academic or professional work unless supported by reputable sources.
  • Be Cautious with Controversial Topics: Articles on polarizing or current issues may be more prone to bias or editing wars.

In summary, Wikipedia can be reliable as a starting point for research, but always corroborate its information with trusted primary or secondary sources.