I have to handle deep nesting of ul
, ol
, and li
tags. I need to give the same view as we are giving in the browser. I want to achieve the following example in a pdf file:
text = " <body> <ol> <li>One</li> <li>Two <ol> <li>Inner One</li> <li>inner Two <ul> <li>hey <ol> <li>hiiiiiiiii</li> <li>why</li> <li>hiiiiiiiii</li> </ol> </li> <li>aniket </li> </li> </ul> <li>sup </li> <li>there </li> </ol> <li>hey </li> <li>Three</li> </li> </ol> <ol> <li>Introduction</li> <ol> <li>Introduction</li> </ol> <li>Description</li> <li>Observation</li> <li>Results</li> <li>Summary</li> </ol> <ul> <li>Introduction</li> <li>Description <ul> <li>Observation <ul> <li>Results <ul> <li>Summary</li> </ul> </li> </ul> </li> </ul> </li> <li>Overview</li> </ul> </body>"
I have to use prawn for my task. But prawn doesn't support HTML tags. So, I came up with a solution using nokogiri
:. I am parsing and later removing the tags with gsub. The below solution I have written for a part of the above content but the problem is ul and ol can vary.
RULES = { ol: { 1 => ->(index) { "#{index + 1}. " }, 2 => ->(index) { "#{}" }, 3 => ->(index) { "#{}" }, 4 => ->(index) { "#{}" } }, ul: { 1 => ->(_) { "\u2022 " }, 2 => ->(_) { "" }, 3 => ->(_) { "" }, 4 => ->(_) { "" }, } } def ol_rule(group, deepness: 1) group.search('> li').each_with_index do |item, i| prefix = RULES[:ol][deepness].call(i) item.prepend_child(prefix) descend(item, deepness + 1) end end def ul_rule(group, deepness: 1) group.search('> li').each_with_index do |item, i| prefix = RULES[:ul][deepness].call(i) item.prepend_child(prefix) descend(item, deepness + 1) end end def descend(item, deepness) item.search('> ol').each do |ol| ol_rule(ol, deepness: deepness) end item.search('> ul').each do |ul| ul_rule(ul, deepness: deepness) end end doc = Nokogiri::HTML.fragment(text) doc.search('ol').each do |group| ol_rule(group, deepness: 1) end doc.search('ul').each do |group| ul_rule(group, deepness: 1) end puts doc.inner_text 1. One 2. Two 1. Inner One 2. inner Two • hey 1. hiiiiiiiii 2. why 3. hiiiiiiiii • aniket 3. sup 4. there 3. hey 4. Three 1. Introduction 1. Introduction 2. Description 3. Observation 4. Results 5. Summary • Introduction • Description • Observation • Results • Summary • Overview
Problem
1) What I want to achieve is how to handle space when working with ul and ol tags
2) How to handle deep nesting when li come inside ul or li come inside ol
2 Answers
Answers 1
I've come up with a solution that handles multiple identations with configurable numeration rules per level:
require 'nokogiri' ROMANS = %w[i ii iii iv v vi vii viii ix] RULES = { ol: { 1 => ->(index) { "#{index + 1}. " }, 2 => ->(index) { "#{('a'..'z').to_a[index]}. " }, 3 => ->(index) { "#{ROMANS.to_a[index]}. " }, 4 => ->(index) { "#{ROMANS.to_a[index].upcase}. " } }, ul: { 1 => ->(_) { "\u2022 " }, 2 => ->(_) { "\u25E6 " }, 3 => ->(_) { "* " }, 4 => ->(_) { "- " }, } } def ol_rule(group, deepness: 1) group.search('> li').each_with_index do |item, i| prefix = RULES[:ol][deepness].call(i) item.prepend_child(prefix) descend(item, deepness + 1) end end def ul_rule(group, deepness: 1) group.search('> li').each_with_index do |item, i| prefix = RULES[:ul][deepness].call(i) item.prepend_child(prefix) descend(item, deepness + 1) end end def descend(item, deepness) item.search('> ol').each do |ol| ol_rule(ol, deepness: deepness) end item.search('> ul').each do |ul| ul_rule(ul, deepness: deepness) end end doc = Nokogiri::HTML.fragment(text) doc.search('ol:root').each do |group| binding.pry ol_rule(group, deepness: 1) end doc.search('ul:root').each do |group| ul_rule(group, deepness: 1) end
You can then remove the tags or use doc.inner_text depending on your environment.
Two caveats though:
- Your entry selector must be carefully selected. I used your snippet verbatim without root element, thus i had to use ul:root/ol:root. Maybe "body > ol" works for your situation too. Maybe selecting each ol/ul but than walking each and only find those, that have no list parent.
- Using your example verbatim, Nokogiri does not handle the last 2 list items of the first group ol very well ("hey", "Three") When parsing with nokogiri, thus elements already "left" their ol tree and got placed in the root tree.
Current Output:
1. One 2. Two a. Inner One b. inner Two ◦ hey ◦ hey 3. hey 4. hey hey Three 1. Introduction a. Introduction 2. Description 3. Observation 4. Results 5. Summary • Introduction • Description ◦ Observation * Results - Summary • Overview
Answers 2
Whenever you are in a ol
, li
or ul
element, you must recursively check for ol
, li
and ul
. If there are none of them, return (what have been discovered as a substructure), if there are, call the same function on the new node and add its return value to the current structure.
You perform a different action on each node no matter where it is depending on its type and then the function automatically repackage everything.
0 comments:
Post a Comment