I have the following Java program which is supposed to login to my Student website programmatically and return the HTML of the Gradebook. However, when I run it, I am returned with the HTML of the actual login site:
public class Scraper { static String formData = "j_username=[username here]&j_password=[password here]"; static String link = "https://parents.mtsd.k12.nj.us/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=weeklysummary&studentid=100916&action=form"; public static void main (String[] args){ String display = postData(link,formData); System.out.print(display); } public static String postData (String url, String data){ URL link = null; HttpURLConnection connection = null; StringBuffer stringBuffer = new StringBuffer(); DataOutputStream dataOutputStream = null; String document = null; try { link = new URL(url); }catch (Exception e){System.out.print(e);} try { connection = (HttpURLConnection) link.openConnection(); }catch (Exception e){System.out.print(e);} connection.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); connection.setRequestProperty("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"); connection.setRequestProperty("Accept-Encoding","gzip, deflate, br"); connection.setRequestProperty("Accept-Language","en-US,en;q=0.8"); connection.setRequestProperty("Cookie",""); connection.setDoInput(true); connection.setDoOutput(true); connection.setInstanceFollowRedirects(true); //setCookie(connection); //post data String postString = data; stringBuffer.append(postString); try { connection.connect(); }catch (Exception e){System.out.print(e);} try { dataOutputStream = new DataOutputStream(connection.getOutputStream()); }catch (Exception e){System.out.print(e);} try { IOUtils.write(stringBuffer.toString(),dataOutputStream,"UTF-8"); }catch (Exception e){System.out.print(e);} //handle redirects try { if(connection.getResponseCode() == HttpURLConnection.HTTP_MOVED_TEMP || connection.getResponseCode() == HttpURLConnection.HTTP_MOVED_PERM || connection.getResponseCode() == HttpURLConnection.HTTP_SEE_OTHER){ String redirectURL = connection.getHeaderField("Location"); String cookie = connection.getHeaderField("Set-Cookie"); URL redURL = null; try { redURL = new URL(redirectURL); }catch (Exception e){System.out.print(e);} connection = (HttpURLConnection)redURL.openConnection(); connection.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); connection.setRequestProperty("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"); connection.setRequestProperty("Accept-Encoding","gzip, deflate, br"); connection.setRequestProperty("Accept-Language","en-US,en;q=0.8"); connection.setRequestProperty("Cookie",cookie); connection.setDoInput(true); connection.setDoOutput(true); connection.setInstanceFollowRedirects(true); } }catch (Exception e){System.out.print(e);} InputStream inputStream = null; try { inputStream = connection.getInputStream(); document = IOUtils.toString(inputStream,"UTF-8"); } catch (Exception e){System.out.print(e);} return document; } public static void setCookie(HttpURLConnection httpURLConnection){ } }
I originally tried sending the initial POST request to the request URL shown in the network tab of inspect element (https://parents.mtsd.k12.nj.us/genesis/j_security_check), however, this returns an error and no HTML data. Any help is appreciated as I this is the first time I am attempting something of this nature.
Update: After experimenting with the login process, I noticed that the cookie I obtain contains only something along the lines of "lastvisit=95FD925038EF488AA22719B64FB5C4A3" but is missing the "JSESSION_ID". I am unsure as to whether this is causing the problem or not.
Update #2 I edited my code to comply with the suggestions offered, however, I still get the HTML of the login page rather than the grade data page. Also, I add print statements to display the cookie I obtained and I noticed that the "JsessionID" cookie is missing and instead I only get the "lastvisited" cookie.
public class Scraper { static String formData = "user&pass"; static String link = "https://parents.mtsd.k12.nj.us/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=weeklysummary&studentid=100916&action=form"; public static void main (String[] args){ String display = postData(link,formData); System.out.print(display); } public static String postData (String url, String data){ URL link = null; HttpURLConnection connection = null; StringBuffer stringBuffer = new StringBuffer(); DataOutputStream dataOutputStream = null; String document = null; try { link = new URL(url); }catch (Exception e){System.out.print(e);} try { connection = (HttpURLConnection) link.openConnection(); connection.setRequestMethod("GET"); }catch (Exception e){System.out.print(e);} connection.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); connection.setRequestProperty("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"); connection.setRequestProperty("Accept-Encoding","gzip, deflate, br"); connection.setRequestProperty("Accept-Language","en-US,en;q=0.8"); connection.setRequestProperty("Cookie",""); connection.setDoInput(true); connection.setDoOutput(true); //connection.setInstanceFollowRedirects(true); //setCookie(connection); String cookie = connection.getHeaderField("Set-Cookie"); //get cookies for session //try { // connection.connect(); //}catch (Exception e){System.out.print(e);} //try { // dataOutputStream = new DataOutputStream(connection.getOutputStream()); //}catch (Exception e){System.out.print(e);} //try { // IOUtils.write(stringBuffer.toString(),dataOutputStream,"UTF-8"); //}catch (Exception e){System.out.print(e);} //handle redirects try { //post data String postString = data; stringBuffer.append(postString); URL redURL = null; try { redURL = new URL("https://parents.mtsd.k12.nj.us/genesis/j_security_check"); }catch (Exception e){System.out.print(e);} connection = (HttpURLConnection)redURL.openConnection(); connection.setRequestMethod("POST"); connection.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); connection.setRequestProperty("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"); connection.setRequestProperty("Accept-Encoding","gzip, deflate, br"); connection.setRequestProperty("Accept-Language","en-US,en;q=0.8"); connection.setRequestProperty("Cookie",cookie); connection.setDoInput(true); connection.setDoOutput(true); connection.setInstanceFollowRedirects(true); //connection.connect(); }catch (Exception e){System.out.print(e);} InputStream inputStream = null; String cookie2 = null; try { dataOutputStream = new DataOutputStream(connection.getOutputStream()); }catch (Exception e){System.out.print(e);} try { IOUtils.write(stringBuffer.toString(),dataOutputStream,"UTF-8"); System.out.println(stringBuffer.toString()); cookie2 = connection.getHeaderField("Set-Cookie"); //get cookies for session }catch (Exception e){System.out.print(e);} URL fLink = null; try { fLink = new URL("https://parents.mtsd.k12.nj.us/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=weeklysummary&studentid=100916&action=form" + stringBuffer.toString()); }catch (Exception e){System.out.print(e);} HttpURLConnection conn = null; try { conn = (HttpURLConnection) fLink.openConnection(); conn.setRequestMethod("GET"); }catch (Exception e){System.out.print(e);} conn.setRequestProperty("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"); conn.setRequestProperty("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36"); conn.setRequestProperty("Accept-Encoding","gzip, deflate, br"); conn.setRequestProperty("Accept-Language","en-US,en;q=0.8"); conn.setRequestProperty("Cookie",cookie); conn.setDoInput(true); conn.setDoOutput(true); try { conn.connect(); inputStream = conn.getInputStream(); document = IOUtils.toString(inputStream,"UTF-8"); } catch (Exception e){System.out.print(e);} System.out.println("cookie -- " + cookie); System.out.println("cookie2 -- " + cookie2); return document; } }
The output for my cookie print statements is as follows (for experimental purposes)
cookie -- lastvisit=A1753DA7F2454A03B58DF8CBD39C22C4; Expires=Tue, 27-Mar-2018 18:27:55 GMT cookie2 -- null
6 Answers
Answers 1
This site is using Java Servlet authentication mechanism. When you try to get any page that is password protected the server internally issues a forward that returns the login page content. After you post a valid username/password it forwards you to the requested URL.
Note that this is not a redirect, it is managed internally by the application server and the client does not see what is going on at the server side.
To get this to work you need to issue a GET to the page you are trying to retrieve. The server responds with the login page content and the session id as a cookie. You have to save the session id and send it when you post to the form action.
Since I don't have valid credentials to test I cannot guarantee that this will work. With invalid credentials, the server will issue a number of redirects which will eventually change the session, probably because it is invalidated and recreated.
Please try this with valid credentials and let me know if it works. If there are redirects on successful login maybe it will be necessary to disable automatic redirects and do them programmatically, checking for session id changes. Depending on the application server, it may change the session id for security purposes.
That said, this MAY work when you post valid credentials.
import java.io.BufferedReader; import java.io.DataOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.net.HttpURLConnection; import java.net.URL; import java.net.URLEncoder; import java.util.List; import java.util.ListIterator; import java.util.Map; public class Scraper { private static final String USERNAME = "user@test.com"; private static final String PASSWORD = "secret"; private static final String GET_URL = "https://parents.mtsd.k12.nj.us/genesis/parents?tab1=studentdata&tab2=gradebook&tab3=weeklysummary&studentid=100916&action=form"; private static final String POST_URL = "https://parents.mtsd.k12.nj.us/genesis/j_security_check"; public static void main(String[] args) { String cookies = doGet(GET_URL); doPost(POST_URL, cookies); }; /** * Send the initial GET request which will forward to the login page * and retrieve cookies sent by the server. * Cookies are formatted according to HTTP specification so they can be * passed to the next request Cookie header. * @param getURL URL to get */ public static String doGet(String getURL) { StringBuilder formattedCookies = new StringBuilder(); try { URL url = new URL(getURL); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); InputStream is = conn.getInputStream(); Map<String, List<String>> headers = conn.getHeaderFields(); List<String> cookies = headers.get("Set-Cookie"); ListIterator<String> it = cookies.listIterator(); while (it.hasNext()) { String[] parts = it.next().split("; "); formattedCookies.append(parts[0]); if (it.hasNext()) { formattedCookies.append("; "); } } System.out.println("\n\nGET OUTPUT"); printContent(is); } catch (Exception e) { System.out.println(e.getLocalizedMessage()); } return formattedCookies.toString(); } /** * Post the form parameters and get page content. * @param postURL URL to post to * @param cookies The cookies to send */ public static void doPost(String postURL, String cookies) { try { String postData = String.format("j_username=%s&j_password=%s", URLEncoder.encode(USERNAME, "UTF-8"), PASSWORD); URL url = new URL(postURL); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setRequestMethod("POST"); conn.setDoOutput(true); conn.setInstanceFollowRedirects(true); conn.setRequestProperty("Cookie", cookies); OutputStreamWriter out = new OutputStreamWriter(conn.getOutputStream()); out.write(postData); out.close(); InputStream is = conn.getInputStream(); System.out.println("\n\nPOST OUTPUT"); printContent(is); } catch (Exception e) { System.out.println(e.getLocalizedMessage()); } } public static void printContent(InputStream is) throws IOException { BufferedReader reader = new BufferedReader(new InputStreamReader(is)); String line; while ((line = reader.readLine()) != null) { System.out.println(line); } } }
Answers 2
The GET URL actually responds with the html being returned. You want specific html and you can use a different endpoint to actually return the html you need
Answers 3
A few suggestions:
- The login request has to be a POST request, not a GET request.
- You need to provide a cookie with the login POST request. Therefore you might want to make a get request to the server first and use the Set-Cookie response value in the subsequent POST.
Answers 4
Try switching to HttpsConnectionURL. Also make certain that you're sending all of the parameters the sign-in is expecting (it may very well want the name of the button). If you haven't seen it yet, here's a quick tutorial using Java to log in to a secure website: http://www.mkyong.com/java/how-to-automate-login-a-website-java-example/
Answers 5
Try using OkHttp library
then
public static String postData(String url, String data){ try { OkHttpClient client = new OkHttpClient().newBuilder() .connectTimeout(180, TimeUnit.SECONDS) .readTimeout(300, TimeUnit.SECONDS) .writeTimeout(300, TimeUnit.SECONDS).build(); MediaType mediaType = MediaType.parse("application/x-www-form-urlencoded"); RequestBody body = RequestBody.create(mediaType, data); Request request = new Request.Builder() .url(url) .post(body) .addHeader("Accept","text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8") .addHeader("User-Agent","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36") .addHeader("Accept-Encoding","gzip, deflate, br") .addHeader("Accept-Language","en-US,en;q=0.8") .build(); String body = client.newCall(request).execute().body().string(); return body; }catch(Exception e){ e.printStackTrace(); return “An error occurred”; } }
Answers 6
You are trying to retrieve the Set-Cookie header from the connection, which if you notice in the headers, has two entries. One with the last visited entry and another with the JSessionID entry.
In your cookie retrieval the JSessionID entry is getting ignored. Retrieve the entries as a List and set it properly.
List<String> cookies = connection.getHeaderFields().get("Set-Cookie"); for (String cookie : cookies) { connection.addRequestProperty("Cookie", cookie.split(";", 2)[0]); }
0 comments:
Post a Comment