HTML scraping? Possible?

Software technical issues not related to any of the other categories

Moderators: Georges, milos, NiceLabel Support Team

Post Reply
Phoenix775
Enthusiast
Posts: 11
Joined: Fri Dec 11, 2015 12:51 pm

HTML scraping? Possible?

Post by Phoenix775 » Tue Aug 09, 2016 10:08 am

Hi All

So I am now looking into html scraping to extract information from from one of our customers websites.

Currently we log on and copy and paste the info required into excel.

No I am a beginner with programming but I am willing to take training further if I know its the correct path to take, ive read up on a few tutorials and think it may be ideal.

Currently the process is
-Go to web page
-Input login details
-Select from banner 'ORDERS'
-input order number
-Copy supplier number to excel
-Click on another link to open up the sub info
-copy all size info from what appears to be one table, size, sku, quantity to excel
-from the same page in a separate table copy the colour, sub category in to excel.

Then we would save the file and print using a database in the label.

However I guess what im really asking is can i input an order number into a variable in nice label and it would automatically do the above with the aide of a script/function directly through nicelabel itself?

If not would it be possible through excel then save the file ready to print via nicelabel.

AMI JUST DREAMING? or is it possible with the correct know how.

Many Thanks P

User avatar
Saso
NiceLabel
NiceLabel
Posts: 2933
Joined: Mon Sep 04, 2006 8:09 am
Contact:

Re: HTML scraping? Possible?

Post by Saso » Tue Aug 30, 2016 8:29 am

Well, while you can program/record the keystrokes the computer should automatically execute and run them through some command-line utility using the action Open document/program, there might be better possibility.

Does the website support any kind of API? So you would connect to it using the HTTP/Web Service call from Automation? You would call the method in this API to request the needed information... So you would not extract data from HTML code, but you would get it back in some structure (e.g. XML, JSON).

If API is not available, then you will have to entertain yourself with some scripting and/or external utilities to first get the data from the website, and then parse it. Once you have the necessary data extracted, you could save it to Excel and leave it to the database connection in the label, or you could use the data directly (Automation would assign values to label variables directly, not through a database connection).

This looks like a very interesting project...
Saso Fleiser
Senior Technical Product Specialist

Phoenix775
Enthusiast
Posts: 11
Joined: Fri Dec 11, 2015 12:51 pm

Re: HTML scraping? Possible?

Post by Phoenix775 » Tue Oct 18, 2016 3:31 pm

Thanks for your reply.

I have had some time, not much, with a program called ParseHub which kind of gets me to the point of where the data extracts to excel. However it will not let me enter a variable, for example an order number. So this method is useless, but it shows it can be done.

I have no clue whether the API is supported, can you advise how I could test for this.

The site is password protected, but information cannot be altered by the account so it would be possible to provide passwords within a private chat.

Sounds cheeky but im not about to turn down help from a knowledgeable source :)

Many Thanks

User avatar
Saso
NiceLabel
NiceLabel
Posts: 2933
Joined: Mon Sep 04, 2006 8:09 am
Contact:

Re: HTML scraping? Possible?

Post by Saso » Wed Oct 19, 2016 10:01 am

Hi Phoenix775,

So you have two problems to solve:
  1. Scrape the data from web page, save to Excel.
  2. Print the labels with data from Excel, but only the one matching the order number.
For second problem, the solution is available with NiceLabel software. You can build PowerForms where the user can enter the order number and NiceLabel finds the required record. You can also leave it to NiceLabel Automation to do the same automatically (in this case, the existing application that you use has to provide the order number to Automation through some event).

The first problem is a tougher nut to crack. NiceLabel doesn't have anything out of the box to scrape data from HTML pages. You can do wonders in NiceLabel Automation, but complex cases usually involve some scripting.

So, if you are happy with the result that ParseHub provides, great. You can use ParseHub to get the data into Excel, and you use NiceLabel for label printing. If ParseHub is a desktop application, perhaps you can run it from NiceLabel Automation (there is the action to tun any third party application). If ParseHub exposes some API that you can call (I see some API here) it could also be called from Automation (using the action HTTP request).

However, if the data scraping is not a frequent process (e.g. it happens once per month, or once per week), perhaps you can also afford to to it manually...
Saso Fleiser
Senior Technical Product Specialist

FunDeckHermit
Enthusiast
Posts: 14
Joined: Fri Apr 20, 2018 1:58 pm

Re: HTML scraping? Possible?

Post by FunDeckHermit » Mon Apr 30, 2018 2:48 pm

Hi Phoenix775,

You can create a PowerShell script that scrapes a website. You can connect this script to a button event with PowerForms.
Another option is AutoHotkey to send button/key presses or open the website in a browser. This is the most simple solution.
The Autohotkey script can be compiled into an .exe file that can be run with NiceLabel PowerForms.

Example Powershell code:

Code: Select all

$url = "http://https://www.w3schools.com/"
$ie = new-object -ComObject "InternetExplorer.Application"
$ie.navigate($url)
while($ie.Busy) { Start-Sleep -Milliseconds 100 }

$shell = New-Object -ComObject Shell.Application
$ieTab = ($shell.Windows() | Where-Object{ $_.LocationName -like "W3Schools" }) | Select-Object -first 1
$buttons= ($ieTab.Document.body.getElementsByTagName('div') |  where {$_.getAttributeNode('class').Value -eq 'w3-button'}).title
Example AutoHotkey code:

Code: Select all

F5::Deploy()

Deploy(){
	run iexplore.exe "https://www.w3schools.com/"
	sleep, 2000
	IfWinExist, W3Schools Online Web Tutorials - Internet Explorer,
	{
		WinActivate, W3Schools Online Web Tutorials - Internet Explorer
		WinwaitActive, W3Schools Online Web Tutorials - Internet Explorer,
		Click 357, 438
		Send ^{a}
	}
}

Post Reply