Important: Always Scrape Responsibly
➡️ Before you begin web scraping, verify that the website’s terms of service or robots.txt file permit it. If scraping is not allowed, do not proceed. Always follow ethical web scraping practices to respect the site owner’s rights and resources.
Ethical Web Scraping Checklist ✅
Use this quick checklist before running any scraping script:
1) Check Permissions – Review the site’s Terms of Service and robots.txt file to confirm scraping is allowed.
2) Respect Rate Limits – Avoid sending too many requests in a short time; this prevents server overload.
3) Credit the Source – If you use scraped data publicly, acknowledge the original website.
4) Avoid Sensitive Data – Never scrape personal, private, or copyrighted content without permission.
5) Use Data Responsibly – Ensure the data is used for legal and ethical purposes only.
6) Test on Small Batches – Start with a small sample to ensure your script works without causing harm.
Getting Started
The process is straightforward, but you’ll need to identify two key elements first:
1. The target web page you want to scrape.
2. The HTML element (such as the <img> tag) where the images are embedded.
Once you have these, you can proceed with extracting the image data efficiently.
Most images you see on a web page are rendered using the <img> tag. In some cases, though, they’re added dynamically with JavaScript, jQuery, or similar scripts, often to load content on demand or create interactive effects.
Let us assume, few images are added in this particular web page using the <img /> tag. Here's the script to scrape the web page and extract all the images.
<body> <!--show extracted images here--> <div id='container' style='padding:10px; border:solid 1px #ddd;'></div> </body> <script> // page to scrape. const url = 'https://www.encodedna.com/text-on-image/add-text-to-image-and-save-the-image.htm'; const web_scrape = async () => { let response = await fetch(url); const html = await response.text(); let webPage = new DOMParser().parseFromString(html, 'text/html'); const art_img = webPage.getElementsByTagName('img'); if (art_img.length > 0) { for (let i = 0; i < art_img.length; i++) { // Try normal src first, then fallback to data-src (if any). let imgSrc = art_img[i].getAttribute('src') || art_img[i].getAttribute('data-src'); if (imgSrc) { let img = new Image(); img.src = imgSrc; document.getElementById('container').appendChild(img); } } } }; web_scrape(); </script> </html>
In the above script:
fetch(url) sends an asynchronous request to retrieve the HTML of the target page.
response.text() converts the response into a text string containing the HTML source.
DOMParser().parseFromString() turns that HTML string into a DOM object so it can be queried like a normal web page.
getElementsByTagName('img') returns all <img> elements from the parsed HTML.
The loop iterates through each <img> tag, creates a new Image() object, sets its src to the found image URL, and appends it to the #container div.
Key Points
This method works in any modern browser without extra libraries.
You don’t need to know the CSS classes of the images—just the tag name.
Always confirm the site allows scraping before running the script.
📝 "Your thoughts deserve a space. Write, format, and save with ease." Use My Fancy Notepad
What’s great about this script is that it works seamlessly across all major browsers, making web scraping straightforward and hassle‑free. That said, always confirm that the site you’re scraping permits it — ethical and legal compliance should come first.