Sunday, April 22, 2018

Check if a Javascript string is a url

Leave a Comment

Is there a way in javascript to check if a string is a url?

RegExes are excluded because the url is most likely written like stackoverflow; that is to say that it might not have a .com, www or http

16 Answers

Answers 1

A related question with an answer:

Javascript regex URL matching

Or this Regexp from Devshed:

function ValidURL(str) {   var pattern = new RegExp('^(https?:\/\/)?'+ // protocol     '((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|'+ // domain name     '((\d{1,3}\.){3}\d{1,3}))'+ // OR ip (v4) address     '(\:\d+)?(\/[-a-z\d%_.~+]*)*'+ // port and path     '(\?[;&a-z\d%_.~+=-]*)?'+ // query string     '(\#[-a-z\d_]*)?$','i'); // fragment locater   if(!pattern.test(str)) {     alert("Please enter a valid URL.");     return false;   } else {     return true;   } } 

Answers 2

function isURL(str) {   var pattern = new RegExp('^(https?:\\/\\/)?'+ // protocol   '((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.?)+[a-z]{2,}|'+ // domain name   '((\\d{1,3}\\.){3}\\d{1,3}))'+ // OR ip (v4) address   '(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+ // port and path   '(\\?[;&a-z\\d%_.~+=-]*)?'+ // query string   '(\\#[-a-z\\d_]*)?$','i'); // fragment locator   return pattern.test(str); } 

Answers 3

Rather than using a regular expression, I would recommend making use of an anchor element.

when you set the href property of an anchor, various other properties are set.

var parser = document.createElement('a'); parser.href = "http://example.com:3000/pathname/?search=test#hash";  parser.protocol; // => "http:" parser.hostname; // => "example.com" parser.port;     // => "3000" parser.pathname; // => "/pathname/" parser.search;   // => "?search=test" parser.hash;     // => "#hash" parser.host;     // => "example.com:3000" 

source

However, if the value href is bound to is not a valid url, then the value of those auxiliary properties will be the empty string.

Edit: as pointed out in the comments: if an invalid url is used, the properties of the current URL may be substituted.

So, as long as you're not passing in the URL of the current page, you can do something like:

function isValidURL(str) {    var a  = document.createElement('a');    a.href = str;    return (a.host && a.host != window.location.host); } 

Answers 4

To Validate Url using javascript is shown below

function ValidURL(str) {   var regex = /(http|https):\/\/(\w+:{0,1}\w*)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%!\-\/]))?/;   if(!regex .test(str)) {     alert("Please enter valid URL.");     return false;   } else {     return true;   } } 

Answers 5

Improvement on the accepted answer...

  • Has double escaping for backslashes (\\)
  • Ensures that domains have a dot and an extension (.com .io .xyz)
  • Allows full colon (:) in the path e.g. http://thingiverse.com/download:1894343
  • Allows ampersand (&) in path e.g http://en.wikipedia.org/wiki/Procter_&_Gamble
  • Allows @ symbol in path e.g. https://medium.com/@techytimo

    isURL(str) {   var pattern = new RegExp('^(https?:\\/\\/)?'+ // protocol   '((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|'+ // domain name and extension   '((\\d{1,3}\\.){3}\\d{1,3}))'+ // OR ip (v4) address   '(\\:\\d+)?'+ // port   '(\\/[-a-z\\d%@_.~+&:]*)*'+ // path   '(\\?[;&a-z\\d%@_.,~+&:=-]*)?'+ // query string   '(\\#[-a-z\\d_]*)?$','i'); // fragment locator   return pattern.test(str); } 

Answers 6

Rely on a library: https://www.npmjs.com/package/valid-url

import { isWebUri } from 'valid-url'; // ... if (!isWebUri(url)) {     return "Not a valid url."; } 

Answers 7

You can try to use URL constructor: if it doesn't throw, the string is a valid URL:

const isValidUrl = (string) => {   try {     new URL(string);     return true;   } catch (_) {     return false;     } } 

Answers 8

I can't comment on the post that is the closest #5717133, but below is the way I figured out how to get @tom-gullen regex working.

/^(https?:\/\/)?((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|((\d{1,3}\.){3}\d{1,3}))(\:\d+)?(\/[-a-z\d%_.~+]*)*(\?[;&a-z\d%_.~+=-]*)?(\#[-a-z\d_]*)?$/i 

Answers 9

(I don't have reps to comment on ValidURL example; hence post this as an answer.)

While use of protocol relative URLs is not encouraged (The Protocol-relative URL), they do get employed sometimes. To validate such an URL with a regular expression the protocol part could be optional, e.g.:

function isValidURL(str) {     var pattern = new RegExp('^((https?:)?\\/\\/)?'+ // protocol         '((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.)+[a-z]{2,}|'+ // domain name         '((\\d{1,3}\\.){3}\\d{1,3}))'+ // OR ip (v4) address         '(\\:\\d+)?(\\/[-a-z\\d%_.~+]*)*'+ // port and path         '(\\?[;&a-z\\d%_.~+=-]*)?'+ // query string         '(\\#[-a-z\\d_]*)?$','i'); // fragment locater     if (!pattern.test(str)) {         return false;     } else {         return true;     } } 

As others noted, regular expression does not seem to be the best suited approach for validating URLs, though.

Answers 10

One function that I have been using to validate a URL "string" is:

var matcher = /^(?:\w+:)?\/\/([^\s\.]+\.\S{2}|localhost[\:?\d]*)\S*$/;  function isUrl(string){   return matcher.test(string); } 

This function will return a boolean whether the string is a URL.

Answers 11

As has been noted the perfect regex is elusive but still seems to be a reasonable approach (alternatives are server side tests or the new experimental URL API). However the high ranking answers are often returning false for common URLs but even worse will freeze your app/page for minutes on even as simple a string as isURL('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'). It's been pointed out in some of the comments, but most probably haven't entered a bad value to see it. Hanging like that makes that code unusable in any serious application. I think it's due to the repeated case insensitive sets in code like ((([a-z\\d]([a-z\\d-]*[a-z\\d])*)\\.?)+[a-z]{2,}|' .... Take out the 'i' and it doesn't hang but will of course not work as desired. But even with the ignore case flag those tests reject high unicode values that are allowed.

The best already mentioned is:

function isURL(str) {   return /^(?:\w+:)?\/\/([^\s\.]+\.\S{2}|localhost[\:?\d]*)\S*$/.test(str);  } 

That comes from Github segmentio/is-url. The good thing about a code repository is you can see the testing and any issues and also the test strings run through it. There's a branch that would allow strings missing protocol like google.com.

There is one other repository I've seen that is even better for isURL at dperini/regex-weburl.js, but it is highly complex. It has a bigger test list of valid and invalid URLs. The simple one above still passes all the positives and only fails to block a few odd negatives like http://a.b--c.de/ as well as the special ips.

Whichever you choose, run it through this function which I've adapted from the tests on dperini/regex-weburl.js, while using your browser's Developer Tools inpector.

function testIsURL() { //should match console.assert(isURL("http://foo.com/blah_blah")); console.assert(isURL("http://foo.com/blah_blah/")); console.assert(isURL("http://foo.com/blah_blah_(wikipedia)")); console.assert(isURL("http://foo.com/blah_blah_(wikipedia)_(again)")); console.assert(isURL("http://www.example.com/wpstyle/?p=364")); console.assert(isURL("https://www.example.com/foo/?bar=baz&inga=42&quux")); console.assert(isURL("http://✪df.ws/123")); console.assert(isURL("http://userid:password@example.com:8080")); console.assert(isURL("http://userid:password@example.com:8080/")); console.assert(isURL("http://userid@example.com")); console.assert(isURL("http://userid@example.com/")); console.assert(isURL("http://userid@example.com:8080")); console.assert(isURL("http://userid@example.com:8080/")); console.assert(isURL("http://userid:password@example.com")); console.assert(isURL("http://userid:password@example.com/")); console.assert(isURL("http://142.42.1.1/")); console.assert(isURL("http://142.42.1.1:8080/")); console.assert(isURL("http://➡.ws/䨹")); console.assert(isURL("http://⌘.ws")); console.assert(isURL("http://⌘.ws/")); console.assert(isURL("http://foo.com/blah_(wikipedia)#cite-1")); console.assert(isURL("http://foo.com/blah_(wikipedia)_blah#cite-1")); console.assert(isURL("http://foo.com/unicode_(✪)_in_parens")); console.assert(isURL("http://foo.com/(something)?after=parens")); console.assert(isURL("http://☺.damowmow.com/")); console.assert(isURL("http://code.google.com/events/#&product=browser")); console.assert(isURL("http://j.mp")); console.assert(isURL("ftp://foo.bar/baz")); console.assert(isURL("http://foo.bar/?q=Test%20URL-encoded%20stuff")); console.assert(isURL("http://مثال.إختبار")); console.assert(isURL("http://例子.测试")); console.assert(isURL("http://उदाहरण.परीक्षा")); console.assert(isURL("http://-.~_!$&'()*+,;=:%40:80%2f::::::@example.com")); console.assert(isURL("http://1337.net")); console.assert(isURL("http://a.b-c.de")); console.assert(isURL("http://223.255.255.254")); console.assert(isURL("postgres://u:p@example.com:5702/db")); console.assert(isURL("https://d1f4470da51b49289906b3d6cbd65074@app.getsentry.com/13176"));  //SHOULD NOT MATCH: console.assert(!isURL("http://")); console.assert(!isURL("http://.")); console.assert(!isURL("http://..")); console.assert(!isURL("http://../")); console.assert(!isURL("http://?")); console.assert(!isURL("http://??")); console.assert(!isURL("http://??/")); console.assert(!isURL("http://#")); console.assert(!isURL("http://##")); console.assert(!isURL("http://##/")); console.assert(!isURL("http://foo.bar?q=Spaces should be encoded")); console.assert(!isURL("//")); console.assert(!isURL("//a")); console.assert(!isURL("///a")); console.assert(!isURL("///")); console.assert(!isURL("http:///a")); console.assert(!isURL("foo.com")); console.assert(!isURL("rdar://1234")); console.assert(!isURL("h://test")); console.assert(!isURL("http:// shouldfail.com")); console.assert(!isURL(":// should fail")); console.assert(!isURL("http://foo.bar/foo(bar)baz quux")); console.assert(!isURL("ftps://foo.bar/")); console.assert(!isURL("http://-error-.invalid/")); console.assert(!isURL("http://a.b--c.de/")); console.assert(!isURL("http://-a.b.co")); console.assert(!isURL("http://a.b-.co")); console.assert(!isURL("http://0.0.0.0")); console.assert(!isURL("http://10.1.1.0")); console.assert(!isURL("http://10.1.1.255")); console.assert(!isURL("http://224.1.1.1")); console.assert(!isURL("http://1.1.1.1.1")); console.assert(!isURL("http://123.123.123")); console.assert(!isURL("http://3628126748")); console.assert(!isURL("http://.www.foo.bar/")); console.assert(!isURL("http://www.foo.bar./")); console.assert(!isURL("http://.www.foo.bar./")); console.assert(!isURL("http://10.1.1.1"));} 

And then test that string of 'a's.

Answers 12

I am using below function to validate URL with or without http/https:

function isValidURL(string) {    var res = string.match(/(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/g);    if (res == null)      return false;    else      return true;  };    var testCase1 = "http://en.wikipedia.org/wiki/Procter_&_Gamble";  console.log(isValidURL(testCase1)); // return true    var testCase2 = "http://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&docid=nIv5rk2GyP3hXM&tbnid=isiOkMe3nCtexM:&ved=0CAUQjRw&url=http%3A%2F%2Fanimalcrossing.wikia.com%2Fwiki%2FLion&ei=ygZXU_2fGKbMsQTf4YLgAQ&bvm=bv.65177938,d.aWc&psig=AFQjCNEpBfKnal9kU7Zu4n7RnEt2nerN4g&ust=1398298682009707";  console.log(isValidURL(testCase2)); // return true    var testCase3 = "https://sdfasd";  console.log(isValidURL(testCase3)); // return false    var testCase4 = "dfdsfdsfdfdsfsdfs";  console.log(isValidURL(testCase4)); // return false    var testCase5 = "magnet:?xt=urn:btih:123";  console.log(isValidURL(testCase5)); // return false    var testCase6 = "https://stackoverflow.com/";  console.log(isValidURL(testCase6)); // return true    var testCase7 = "https://w";  console.log(isValidURL(testCase7)); // return false    var testCase8 = "https://sdfasdp.ppppppppppp";  console.log(isValidURL(testCase8)); // return false

Answers 13

You can use the URL native API:

  const isUrl = string => {       try { return Boolean(new URL(string)); }       catch(e){ return false; }   } 

Answers 14

Here is yet another method.

var elm;  function isValidURL(u){    if(!elm){      elm = document.createElement('input');      elm.setAttribute('type', 'url');    }    elm.value = u;    return elm.validity.valid;  }    console.log(isValidURL('http://www.google.com/'));  console.log(isValidURL('//google.com'));  console.log(isValidURL('google.com'));  console.log(isValidURL('localhost:8000'));

Answers 15

The question asks a validation method for an url such as stackoverflow, without the protocol or any dot in the hostname. So, it's not a matter of validating url sintax, but checking if it's a valid url, by actually calling it.

I tried several methods for knowing if the url true exists and is callable from within the browser, but did not find any way to test with javascript the response header of the call:

  • adding an anchor element is fine for firing the click() method.
  • making ajax call to the challenging url with 'GET' is fine, but has it's various limitations due to CORS policies and it is not the case of using ajax, for as the url maybe any outside my server's domain.
  • using the fetch API has a workaround similar to ajax.
  • other problem is that I have my server under https protocol and throws an exception when calling non secure urls.

So, the best solution I can think of is getting some tool to perform CURL using javascript trying something like curl -I <url>. Unfortunately I did not find any and in appereance it's not possible. I will appreciate any comments on this.

But, in the end, I have a server running PHP and as I use Ajax for almost all my requests, I wrote a function on the server side to perform the curl request there and return to the browser.

Regarding the single word url on the question 'stackoverflow' it will lead me to https://daniserver.com.ar/stackoverflow, where daniserver.com.ar is my own domain.

Answers 16

I think using the native URL API is better than a complex regex patterns as @pavlo suggested. It has some drawbacks though which we can fix by some extra code. This approach fails for the following valid url.

//cdn.google.com/script.js 

We can add the missing protocol beforehand to avoid that. It also fails to detect following invalid url.

http://w http://.. 

So why check the whole url? we can just check the domain. I borrowed the regex to verify domain from here.

function isValidUrl(string) {     if (string && string.length > 1 && string.slice(0, 2) == '//') {         string = 'http:' + string; //dummy protocol so that URL works     }     try {         var url = new URL(string);         return url.hostname && url.hostname.match(/^([a-z0-9])(([a-z0-9-]{1,61})?[a-z0-9]{1})?(\.[a-z0-9](([a-z0-9-]{1,61})?[a-z0-9]{1})?)?(\.[a-zA-Z]{2,4})+$/) ? true : false;     } catch (_) {         return false;     } } 

The hostname attribute is empty string for javascript:void(0), so it works for that too, and you can also add IP address verifier too. I'd like to stick to native API's most, and hope it starts to support everything in near future.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment