I have the following Wikipedia API search query:
I just want to list famous people - is there a way to do that?
3 Answers
Answers 1
There isn't an exact way to limit your search results to only famous people. However, you can use a few different filters in with Wikipedia's CirrusSearch to roughly narrow your results to people:
incategory:
Can you find a category that includes the people you want? Categories may not be a great solution, since they may be inconveniently specific.linksto:
Do articles about people link to a common article?hastemplate:
Can you find a template that is used on biographies of famous people? The template{{birth date}}
may be a good solution (if it's fine to limit your search to mostly non-fictional people with non-disputed known birthdates).
For example, see your same search result with hastemplate:Birth_date
to see people:
{ "batchcomplete": "", "continue": { "gsroffset": 20, "continue": "gsroffset||" }, "query": { "pages": { "92733": { "pageid": 92733, "ns": 0, "title": "Albert A. Michelson", "index": 14, "thumbnail": { "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Albert_Abraham_Michelson2.jpg/71px-Albert_Abraham_Michelson2.jpg", "width": 71, "height": 100 }, "pageimage": "Albert_Abraham_Michelson2.jpg", "extract": "<p><b>Albert Abraham Michelson</b> (surname pronunciation anglicized as \"Michael-son\", December 19, 1852 \u2013 May 9, 1931) was an American physicist known for his work on the measurement of the speed of light and especially for the Michelson\u2013Morley experiment.</p>" }, "736": { "pageid": 736, "ns": 0, "title": "Albert Einstein", "index": 1, "thumbnail": { "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg/76px-Einstein_1921_by_F_Schmutzer_-_restoration.jpg", "width": 76, "height": 100 }, "pageimage": "Einstein_1921_by_F_Schmutzer_-_restoration.jpg", "extract": "<p><b>Albert Einstein</b> (<span><span>/<span><span title=\"/\u02c8/ primary stress follows\">\u02c8</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span><span title=\"'s' in 'sigh'\">s</span><span title=\"'t' in 'tie'\">t</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span></span>/</span></span>; <small>German:</small> <span title=\"Representation in the International Phonetic Alphabet (IPA)\">[\u02c8alb\u025b\u0250\u032ft \u02c8a\u026an\u0283ta\u026an]</span>; 14 March 1879 \u2013 18 April 1955) was a German-born theoretical physicist.</p>" }, "1139788": { "pageid": 1139788, "ns": 0, "title": "Alfred Einstein", "index": 6, "thumbnail": { "source": "https://upload.wikimedia.org/wikipedia/en/thumb/1/12/Alfred_Einstein.jpg/70px-Alfred_Einstein.jpg", "width": 70, "height": 100 }, "pageimage": "Alfred_Einstein.jpg", "extract": "<p><b>Alfred Einstein</b> (December 30, 1880 \u2013 February 13, 1952) was a German-American musicologist and music editor.</p>" }, ...
Someday, you should be able to use Wikidata to search for entities on Wikipedia that are an instance of human. For now, we'll have to work with search filters.
Answers 2
My workaround for now is to filter search results server-side, by only showing articles that have birth_date
in their revision content.
The bounty is still available if someone finds a way around this.
Answers 3
I think all persons will have ... birthDate)
(if still alive) or birthDate - died)
in the first line of the extract. So I guess you can filter only records with an extract matching this regex:
^[^.]*\d{4}\)[^.]*\..*
Which will only match texts with something like 2001)
in the first row.
If it's safe to assume that other records don't have it (I'm not sure that it is), then you can stop there. If not, at least you filtered a few more records before checking the revision.
0 comments:
Post a Comment