LyricWiki talk archive for Community Portal
Things that a Bot could do

  1. Erase all the blank spaces in the start of lines between <lyric></lyric> tags
  2. Look for articles in Songs (Song_A...) categories that have no <lyric> tag, and put them in an Category such as lyrickize (:P)
    • For the pages that use the old format (spaces at the beginning of lines), it could just automatically convert to the new format. -Sean Colombo
  3. Same as above but with pages with no Song or Songfooter template
  4. If there is no special page for that (i think wikipedia there is). Look for all pages with no Categories and put them in a Uncategorized category or something

More Ideas?? --Unaiaia (talk) 10:22, 14 November 2006 (EST)

Coolness. There is a big list of bot ideas on ÜberBot's user page. These could probably be rolled in there.
-Sean Colombo 11:13, 14 November 2006 (EST)
Copied there. --Unaiaia (talk) 13:52, 14 November 2006 (EST)

LyricWiki in Wikipedia

Do you think it's a good idea to put links to lyricwiki in Wikipedia? Maybe you don't want that much exposure, or maybe you are already doing this, we could create a template such as {{lyricwiki|Artist}} that shows a pretty bos that says... has lyrics for the songs by XXXX, with the logo 30px :D --Unaiaia (talk) 10:22, 14 November 2006 (EST)

Sounds like a good idea. I think someone made a LyricWiki template at some point (I obviously don't use it that much because ppl would call it self-promotion). There is a Userbox on there ({{user LyricWiki|LyricWiki_Username_Here}}) too. I think that'd be cool though if someone did that.
-Sean Colombo 11:04, 14 November 2006 (EST)
Now we have a LyricWiki Template. Use this template to mark pages that have lyrics in LyricWiki. {{LyricWiki|Band_Name}} The Band_Name parameter is optional, use it if the Wikipedia PAGENAME it's diferent from LyricWiki's
--Unaiaia (talk) 19:29, 17 November 2006 (EST)
The Template got deleted marked as SPAM, and also because this site has copyrighted material. Well, i'm pretty upset with this... I think this is a good site and i thought that it could be cool for wikipedia articles to be linked with this site, but it looks like it's not. --Unaiaia (talk) 16:34, 10 December 2006 (EST)
Well, there are a handful of bitter people on Wikipedia it seems, and unfortunately even if 10 people want to create and edit an article, when 1 person gets bitter (or paranoid of the RIAA), they can get things deleted quite quickly. I've seen a bunch of articles on there that just keep getting deleted and recreated :-/. It's sad. Thanks for trying anyway Unaiaia! Keep your chin up! :)
-Sean Colombo 20:24, 10 December 2006 (EST)

==D4 sP3ll1|\|g && |_|zz0rzp4g3 so ttly UNawsum?==

Hi. Does anyone else have a problem with the text on Katsluvfoosball's userpage and the spelling and reasons for the nominations at LyricWiki_talk:Song_of_the_Day. It's not like I'm going to attack anyone for making a spelling error or making a joke, but I'm beginning to think this user doesn't speak English at all and it's taking serious forms now (one SOTD nomination, ok, it's funny, but now three in a row like so ttly awsum!). Just wondering how others think about this, am I the only one who's disturbed by this? If not, I'll ask the user to clean up his user page and correct the nominations, but perhaps I'm just overreacting here? --Mischko 05:10, 7 December 2006 (EST)

I have seen this for a while, and so far I have not reverted or banned the user because pages have not been vandalized, but I also think it is getting a little out of hand. Even user talk pages are for communicating between users and coordinating wiki editing, but Katsluvfoosball (talk,contrib) is using it more like a personal chat page. On top of that, I just looked at the contributions and no edits other that the user page and song of the day nomination have been made. Also, it looks like it may be a shared account. User:Brandonsgrrl has already made another account and I saw another one, just I cannot remember the name.
- teknomunk 11:33, 7 December 2006 (EST)
Howdy. Yeah, I've seen this too and been keeping an eye on it (whilst I should be working on Pedlr) and I figured it would eventually come up as an issue.
I guess I see it this way... Kat appears to be a middleschool/highschool girl in band who quite possibly likes Brandon and LyricWiki. Although I'm completely baffled by what the userpage says (I'm fluent in 1337, but this is apparently a derrivative dialect that has had just far too much sugar), the userpage itself is off in its own corner of the site and hasn't spilled out to cause any problems.
The general public may get sick of SOTD nominations in that format, so I'll go ask her to reword stuff. Otherwise I don't think she's hurting anyone, so we should just be happy that she's probably bringing all of her friends to the site & spreading the word. :)
Awsum. -Sean Colombo 22:07, 10 December 2006 (EST)
Well... just check User:Crazeeflootgrrl, User:Brandonsgrrl and User:Baybayrox. My opinion is that this is not a Blog and that they should register into myspace where i'm sure more people will understand their language. They are "stealing" bandwith and storage... what if their whole school comes here?? --Unaiaia (talk) 12:14, 12 December 2006 (EST)
At first I just kept an eye on it, but now it's beginning to annoy me. Apart from clogging up history lists and giving me more work, they're also eating up bandwith and storage as Unaiaia said. I've cleaned their talk and user pages now and left a message. If they continue to write this stuff I'll issue a vandalism warning and urge them to find out what the site is about and set up their own Wiki if they want to chat with them selves (I can't help but thinking at least few of them are the same person) / their family. If anyone disagrees, they can leave a message on my talk page. --Mischko Talkicon EsperanzaIcon 04:50, 21 December 2006 (EST)

Go nuts with redirects!

Greetings all! On saturday I updated the SOAP to attempt to find more songs by checking artist-redirects if the song page was not found directly.

For example if someone makes a request for The Prodigy:Hot Ride, after failing to find that page, the webservice will realize that since Prodigy points to The Prodigy, it should check for The Prodigy:Hot Ride (which it will find!).

So if you know of common abbreviations/misnomers/variations of band names, please put redirects in and you will help a lot of people. The webservice serves up almost as many lyrics every day as the site serves pages.
Thanks! -Sean Colombo 23:52, 11 December 2006 (EST)

Double pages

Hello everybody! I noticed that one of my favorite bands, Hammerfall, has two different pages, and therefore two different set of song pages.

Thats not good, its a big mess. I would be thankful if someone could help me join Hammerfall and HammerFall to one page called Hammerfall (because that's the spelling preferred at Swedish Wikipedia [1]]), and try to move all the pages HammerFall:Song name to Hammerfall:Song_name. 09:54, 15 December 2006 (EST)

Maybe a job for Überbot? 11:15, 15 December 2006 (EST)
I batch moved all HammerFall pages to Hammerfall. They were probably added in the Other songs section. Please go through them manually to check if everything went ok, add the proper formatting templates and categorize them under their respective albums. When you are sure there have been no problems with the move, please leave a note on my talk page or paste the {{deletion}} template on HammerFall talk page so I can delete HammerFall and all subsequent pages. Have fun! --Mischko 06:09, 17 December 2006 (EST)
It looks good, I compared with HammerFall and with HammerFall:Other Songs and all of the listed HammerFall: pages were either moved to Hammerfall: or linked in a Merge-template in Hammerfall:. But there are a few songs which were listed under a different names in the Hammerfall page. They are:
Some of these pages have text in them so I'll move those texts to the linked page. Or add a { {mergefrom} }-box. Then there's just left to actually merge the double pages, and clean up Hammerfall:Other Songs. 11:59, 20 December 2006 (EST)
OK, looks like something you can do by hand. If you need any help, just plug in a {{helpme}} template somewhere and I'll see what I can do. If you have left-over pages, stick {{deletion|Left-over from [[LyrikWiki talk:Community portal#Double pages|Hammerfall move]]}} on them and I'll remove them. Good luck! --Mischko Talkicon EsperanzaIcon 04:54, 21 December 2006 (EST)

SOAP/Server Responsiveness

Hi to everyone. Really enjoying using LyricWiki, but I'm having some problems updating the Amarok script I've created to fetch lyrics. Namely, I'm attempting to make use of SOAP. I have a procedure in place to search for lyrics so the search function only returning Moonlight Sonata by Beethoven doesn't pose a significant problem, but the time it seems to take the server to respond to requests is where I'm running into trouble.

The problem is that the script must contact the server to create a WSDL driver when it first starts. Theoretically, this isn't a problem, but the server seems to take up to a minute to respond, which means for the first minute Amarok is loaded the lyrics function won't work. It also takes about the same amount of time to respond to specific requests, but while inconvenient that isn't likely to result in people believing it is a buggy script. While the script is not as efficient as it could be using SOAP, it does work now and I'm not receiving complaints that the script doesn't work for the first minute after Amarok has loaded.

In a nutshell, What I'm wondering is if this is merely a server load issue or if its a problem on my end. If it isn't a problem on my end, can I expect to be cleared up and begin implementing SOAP or should I wait for now?

I'm not complaining about the site by any means, I just want to know whether or not its worth the effort to implement SOAP. Thanks.

-Rede 15:28, 15 December 2006 (EST)

It seems to have cleared up. I wonder if its just time of day...
-Rede 01:57, 17 December 2006 (EST)
From the Official Blog:
Dec 14th 2006 - Server in pain

The server is in a bit of pain at the moment. Pages are taking a little while to load and some are getting dropped.

This is a problem that we haven’t had since moving to the new server about a month ago. 
--Unaiaia (talk) 06:17, 17 December 2006 (EST)

Songs in multiple languages w/ different titles

Hello, would it be possible to extend the Multiple Languages template to allow for a different title depending on the language instead of appending /(country code) ? I'm currently doing some editing on some Kent songs, and this would be useful for the Kent:Hagnesta Hill (1999) album which has a Swedish and an English version. Almost all titles of the album match, but often with a slight difference : Kent:Musik Non Stop -> Kent:Music Non Stop, Kent:Kevlarsjäl -> Kent:Kevlar Soul, ...

-M.Pomme 09:21, 17 December 2006 (EST)

Sure it is possible; what kind of interface do you want? Right now the template uses |Lang1=country code to specify the different language codes used to make links. What would work for you?
Alternativly, you could put redirects at the /country_code to the correctly titled song name.
- teknomunk (talk) 16:49, 19 December 2006 (EST)
The redirect stuff sounds like a good solution, I'll have to investigate how this works.
Merry Christmas !
M.Pomme 17:40, 25 December 2006 (EST)


Copied from Talk:Breaking Benjamin:The Diary Of Jane There is too much vandalism on LyricWiki. How do we stop it? ~~•Sean•gorter• 11:07, 17 December 2006 (EST)

I don't think there is too much vandalism in Lyricwiki, most of the times is people that just don't know the way we work. For me best way to fight against it is fixing the vandalized pages, and if you don't have time, marking them so someone will fix them soon. Anyway, I think this belongs to the Community Portal, so I'm copying it there. --Unaiaia (talk) 11:13, 17 December 2006 (EST)
The only "real" vandalism I encounter is talk pages being spammed with links and junk. But as long as that only happens once in a while, it's not too hard to delete and in the worst case admins can ban the vandals temporarily or permanently. This is not Wikipedia and I hope we don't need to be as strict as Wikipedia and implement warning systems and vandalism revert bots. --Mischko 11:16, 17 December 2006 (EST)

iTunes Feed Outage

Of course it is needless to say this, because you have noticed this yourself already, but the iTunes top list on the main page has been empty for a few days already. --Mischko 04:50, 18 December 2006 (EST)

This looks like apple moved the location of their feed. I updated the link and it appears to be working correctly.
- teknomunk (talk) 14:59, 18 December 2006 (EST)

Please some admin fix in some Spanish template...

The text that appears below the editting window in spanish has two lines with spaces in the beggining so it appears like this:

Por favor, ten en cuenta que todas las contribuciones a LyricWiki pueden ser editadas, modificadas o eliminadas por otros colaboradores.

Si no deseas que la gente corrija tus escritos sin piedad

y los distribuya libremente, entonces no los pongas aquí. También tú nos aseguras que escribiste esto tú mismo y eres dueño de los derechos de autor, o lo copiaste desde el dominio público u otra fuente libre.


Could some admin please fix this? I'm not sure where it is... and i don't think i'm able to edit it anyway :D --Unaiaia (talk) 05:02, 11 December 2006 (EST)

This needs to be handled by Sean because to change this you need access to the server; it cannot be done through the wiki.
- teknomunk (talk) 21:43, 18 December 2006 (EST)

Very, very big page

‎LyricWiki Possible Song Covers List 1 is that big that I actually can't even retrieve it from the server!!!

EDIT: I don't know why it doesn't appear here, but here you can see it --Unaiaia (talk) 07:37, 20 December 2006 (EST)

Hmm... yeah that was a long time ago that I had UberBot make that... I think I tried to split it up or something.
I should make a page that's just HTML links for that (then it wouldn't have to load all of that mediawiki initialization junk first) and then split it up by letters or something. ====> User:Sean Colombo/Todo
-Sean Colombo 16:35, 20 December 2006 (EST)

Messing with the ad system

Apparently Google AdSense has notoriously bad payouts (even though they won't say what their percentage is). Since ads still aren't covering hosting (which is a dangerous state to be in since it's adding up very quickly), I'm going to be messing around with other ad networks so that hopefully we don't chill in debt forever. Last week we had a boost in traffic and the server seemed like it couldn't handle it. If these new networks are good enough, maybe they'll be able to support more servers if traffic continues to increase.
I welcome any feedback on the new ad systems.
-Sean Colombo 16:49, 20 December 2006 (EST)

Which ad providers are you looking into? Once I have an idea of what it will look like I will be able to give a much better opinion on the subject. I think that the ads could be much more targeted to our audience. I, for example, don't like seeing every other add as a ring tone advert or other random ad, but I am more likely to look into music purchase sites, digital or hard copy.
- - teknomunk (talk,E) 16:59, 20 December 2006 (EST)
I'm looking into AdBrite, ContextWeb, TextLinkAds, and Casale Media. They all have text ads (unfortunately some only have taller text-ads than the AdSense format), and right now I popped an AdBrite block in because it says that if it can't beat the price that Google was giving us, it will just display the AdSense ad code. If I get any decisive stats on how well they are working I will post.
Have you had any prior experiences with any of these networks?
-Sean Colombo 17:17, 20 December 2006 (EST)
I haven't had any experience in dealing with advertisers. From a glance, though, AdBrite, TextLinkAds and Casale Media look good. ContextWeb's page doesn't render correctly in Firefox and looks to be similar to AdWords.
The AdBrite block that is up now looks fine. TextLinkAds also looks like it is good. I can't find an example of what Casale Media ads look like. I didn't try to hard on the other one.
I hope that this works out and is not too intrusive. If things keep going the way they are, we will probably need another server soon.
- - teknomunk (talk,E) 17:54, 20 December 2006 (EST)
Casale Media didn't approve us, so I guess we don't need to worry about them... TextLinkAds approved us, and that's what the "Sponsors" section is. They will sell a number of single-line text links (the whole section will have a maximum of 8 lines), so this will take up a lot less space than the AdSense... so I've moved the AdBrite/AdSense blob down to the bottom. Hopefully this will let us both get the money for more servers, and at the same time actually improve our ability to use the menus, etc. That would be totally sweet if that works out.
So far it looks like the AdBrite network hasn't been able to beat the AdSense rate, so it's just been displaying AdSense.
I'll give updates in a week or so once stats start rolling in. :)
-Sean Colombo 10:56, 21 December 2006 (EST)

Artists that change their names

There was a discussion on Talk:As_Fast_As regarding artists that change their name, but I thought it should be discussed on a more open forum. Thus I bring it here. There were two options discussed. First, having a redirect on the older name and denoting which albums are under which names, and leave albums as they are. But then what do you do about albums or songs released under both names (note Skin The Kat released under both names). I'm not sure if they have the same lyrics, because I have not heard the older version. The other option was to have info boxes on both denoting the renaming of the artist. --Snoopdugiedog 03:50, 22 December 2006 (EST)

I think that albums and songs released under one artist name should stay under that name (and the artist page with the songs on of course). Info boxes sounds like a good idea! Remiss 13:44, 22 December 2006 (EST)
I agree. The cleanest and fastest solution would be to cross-link the pages but keeping the songs under the name they were released. When they were released under different names with different lyrics, make two pages and an infobox. When they were released with the same lyrics, one can be a redirect and we would simply add two {{Song}} templates. --Mischko Talkicon EsperanzaIcon 15:31, 22 December 2006 (EST)
Just to add to this, I've recently been working on the Maire Brennan page, but on albums after Two Horizons, she's used the name Moya Brennan. All her music is on the Maire Brennan page, so should I rename the page or just put a redirect link for anyone searching for music under her new name? Skyrose

Protected Pages

Do really all these pages have to be protected? I find it a little annoying... do we have something like semi-protection or something?? I don't know... i find it rather "slow" to use the talk page to change to little things in a template... Just my opinion anyway --Unaiaia (talk) 15:35, 23 December 2006 (EST)

The reason they are protected is because they are each on several thousand pages. Any changes to the template will require the server updating all of those pages. Also if they were spammed, it would spam several thousand pages.
There is a semi-protect that only keeps anonymous users from editing pages. Semi-protect does not prevent logged-in people or bots, such as User:Willy on Wheels, from vandalizing these pages. It is not really a problem right now, but I don't want to give it a chance to happen.
There are relatively few people who would be interested in changing those pages and be able to do so correctly, and all of them probably will become administrators eventually. I am not trying to exclude people, and expecially not you, from contributing.
- - teknomunk (talk,E) 16:34, 23 December 2006 (EST)
I believe (I'm not sure) that semiprotection offers protection also for new accounts and that you need a number of edits (say 100) to be able to edit these pages... Anyway, I understand what you mean... Btw... I'm not sure if I did thank you for the code you gave me... I didn't use Ruby before and I read a tutorial about it, it's interesting, I hope I have some time after February and I can do something with it! --Unaiaia (talk) 16:38, 23 December 2006 (EST)
I am not sure about the newly registered users, the protect page simply says 'None','Block unregistered users' and 'Sysops only' as the options. I looked and couldn't find any information on whether "Semi-protect" (Block unregistered users) limits newly registered users.
I left a note on Sean's talk page about getting you administrative privilages. I think that you have done a great job here so far and I would like to see you continue to do so.
I'm glad that you got the code. Ruby is a wonderful language to work in. I have worked in several languages and it is by far my favorite. I will probably be continuing to work on the code so I will probably send you an updated version after I get things cleaned up a bit. If things get a little cleaner, I may end up posting it on something like SourceForge.
Well, I think I will wait for an hour or so until the server load get a little lighter to do any major work.
- - teknomunk (talk,E) 17:42, 23 December 2006 (EST)


I think this site is doing really well as far as content is concerned, and is probably ready for a slashdotting. When the site was dugg a while ago, that helped out immensely (and also indicates that the server probably stands a chance of surviving a slashdotting). We should probably wait until after New Years so that people are actually online again (traffic dips everywhere like crazy until a few days after the New Year when everyone gets back to work/school).
Does anyone here use Slashdot a lot who would know how to make a good submission to their site?
Slashdot is a very OpenSource and coder-oriented site, so I think a few of the points we should cover in a submission are:

  • Based on the OpenSource MediaWiki framework.
  • First 200,000 lyrics were added by ÜberBot.
  • FireFox/Netcape Search Plugins.
  • There is a SOAP Webservice API to let anyone create their own apps based on LyricWiki's content.
  • There are plugins for many major media players already, including Amarok and WinAmp, and standalone apps which work with Windows Media Player, foobar2000, musikCube, etc.
  • No lawsuits yet! - We are in constant talks with publishers to make sure that royalties vs. lawsuits don't become an issue.

Please leave your thoughts and let me know if you are a frequent slashdotter... thanks!
-Sean Colombo 12:22, 29 December 2006 (EST)

It's a good idea, i've never used Slashdot but i think it will mean a great traffic to the site if gets published. If you write a article for the site maybe i could translate it and send it to the Spanish clone of Slashdot, Barrapunto (i don't know how this works either). I need to add more spanish content if i'm doing this... mmmm
Anyway, where i wanted to get.. it's that u should take care of server issues before getting traffic from slashdot, servers overload everyday so imagine if you have thousands of users coming at the same time... the site will die and this is not good news... people in slashdot will comment about it... --Unaiaia (talk) 12:27, 29 December 2006 (EST)
It would be nice to have some more users contributing, but I doubt that LyricWiki would even come close to survive being slashdotted. I experience short deficiencies every day. Of course only a few sites really "survive" being slashdotted without temporary DoS. The question is more for how long the wiki becomes practically unusable - remember it's a highly database, uh, based site. Maybe it would be wiser to promote LyricWiki on a less known page first. Unfortunatly I'm not that familiar with English speaking sites, to name one now... The lawsuits issue is pretty hot. I wonder what becomes of this, if being discussed at slashdot. --Lentando 15:29, 29 December 2006 (EST)
I also don't think that we are ready for a Slashdot, yet. As mentioned above, the traffic would kill our server if a story was accepted. It would also be preferable to have our first license with a publisher before we get a Slashdot story. Whenever our site is mentioned on other sites, the very first issue to come up is copyright concerns and that we will be sued into oblivion. If we can get a license for at least some of the lyrics, it would make a submission to slashdot or anywhere else much more meaningful.
- - teknomunk (talk,E) 15:33, 29 December 2006 (EST)


I noticed a strange phenomenon while watching the statistics page. While the overall number of pages increases, the number of pages the wiki counts as legitimate content pages is diminishing. Since the last one has a rather prominent place on the main page...

  • (date -- overall pp. -- content pp.)

2006-12-20 -- 261,267 -- 241,553
2006-12-22 -- 261,736 -- 241,499
2006-12-23 -- 261,990 -- 241,443
2006-12-25 -- 262,294 -- 241,378
2006-12-25 -- 262,365 -- 241,380
2006-12-29 -- 262,569 -- 240,708
Any idea what could be behind this? My guess is, that the wiki software counts new song pages with templates not as content pages. --Lentando 15:44, 29 December 2006 (EST)

Part of what is happening is the administrators (myself included) are finally getting Category:Requests For Deletion under control. This can account for a 100-200 page decrease. But even that does not account for the entire 2147 pages not accounted for.
- - teknomunk (talk,E) 16:29, 29 December 2006 (EST)

Can someone help...

Tonight I splitted the mewithoutYou page into albums, but it took FOREVER to write every song name twice, once with underscores and once without. Is there a webform anywhere that will simply spit out the wiki format if I give it the album name, artist, year, and then the track names? Maybe it would take some time, but it would save a ton of time for a lot of people. If there's a more convenient way to do it already, please, please let me know. Noodlepaste 07:15, 31 December 2006 (EST)

Try [[MewithoutYou:In A Sweater Poorly Knit|]] (with spaces instead of underscores)
When submitting the page it will automatically convert to In A Sweater Poorly Knit.
Actually, the empty pipe will
  • strip everything before the first colon
  • remove anything in brackets at the end (e.g. album year)
  • convert the display format to wiki format in the link only — The preceding unsigned comment was added by Mischko (talkcontribs).
Hi! I think there's no difference between writing the underscores or not... i don't do it and it works :D So i can just copy and paste the names. --Unaiaia (talk) 07:48, 31 December 2006 (EST)
I use the text editor "vi" (vim) to wikify track lists. If that's an option for you, I can explain this in detail. Anyway, it's not impossible to do this for a webform service. I think it's worth the work, you're probably not the only one, who would use it. Give me some time, I'll try. --Lentando 09:30, 31 December 2006 (EST)
Here's a webform that basically does the job (src). Don't expect too much, it's pretty dumb. Next year (haha) I'll add some empty fields/line/spaces detection/stripping and neat options like leaving alternative links unformatted and stripping everything outside quotation marks. --Lentando 18:03, 31 December 2006 (EST)
It may be stupid, but it's really useful, thanks. - (sah 18:37, 31 December 2006 (EST))
Just so you know, this doesn't work when punctuation is included, i.e. Ayumi Hamasaki gets reformated as Ayumi Hamasaki. I think a similar problem is occuring with the SOAP interface. If that is written in PHP, then this should work for that as well:
function camel_caps($txt)
  return ucwords(preg_replace("/([^A-Za-z])([a-z])/e",'$1.strtoupper($2)',strtolower($txt)));
If you intend to keep this up for some time Lentando, you may want to add the link to the help section so that people can find it easily.
- - teknomunk (talk,E) 20:07, 31 December 2006 (EST)
After some tuning I think the script is ready now for broad public. But I would like to ask you, Teknomunk, to find a suitable place for a link on the help page. (Links same as above).
And when you're on it, -smile- maybe you can edit the {{SongFooter}} template too: there is a space missing between Amazon link to album name and the bullet before it. Thank you. :) --Lentando 16:55, 3 January 2007 (EST)
I put a link in both the Formatting Artists page and Formatting Albums pages. As to SongFooter, I will wait until later today when the server is not as bogged down.
- - teknomunk (talk,E) 19:55, 3 January 2007 (EST)

Reducing SOAP Failure Rate

Hi... I started logging the success/failure rate on the getSong() method in the SOAP. I was a bit dissappointed since the results were around 33% good, 66% lame. I expected the results to be fairly low since there are a ton of different ways to write titles, and a lot of techno songs don't have lyrics (or even {{instrumental}} pages), but I still think it could be a lot higher.

I've done a few tricks to up the percentage a little... the main one being that artist names with redirects will work to redirect songs also (eg: if Prodigy redirects to The Prodigy then someone using the SOAP to look for The Prodigy:Action Radar will get the result for The Prodigy:Action Radar with no problem). There are also tricks to try to make the formats the same (cutting off parentheses, etc.), but I wanted to know which songs were being missed so that I could make better algorithms to get a higher match percentage.

So today I started logging the failed requests, and I made a special page at Special:Soapfailures which lists the top 50 most requested songs that could not be found. So if we are legitimately missing pages, that would be a good place to fill them in.

...just thought I'd show everyone where that new page was and keep you up to date on the SOAP changes.
-Sean Colombo 22:09, 31 December 2006 (EST)

Thanks for the page. Do you know why it is listed under 'Restricted special pages' on Special:Specialpages. If it was built from Special:Batchmove then it may be restricted to only administrators. Never mind, it's fine.
- - teknomunk (talk,E) 23:17, 31 December 2006 (EST)
Sean, do you think that you can remove a song from the list first time it is successfully found?
- - teknomunk (talk,E) 23:56, 2 January 2007 (EST)
That's a cool idea... once the SOAP is up and working again it will have that (why is the site not really fast right now? ... the SOAP is almost entirely disabled :().
-Sean Colombo 20:11, 3 January 2007 (EST)
Another thing, as of this writting, about two fifths of the failures have no artist at all, and cannot be fixed through the wiki. There are also two entries that have ??? for the artist, and are also invalid. Both of these should return an error.
- - teknomunk (talk,E) 23:06, 3 January 2007 (EST)

I've done a good bit to improve both the speed and accuracy of the SOAP. Way too many details can be found in the blog post.
-Sean Colombo 00:53, 7 January 2007 (EST)

You mentioned in the blog that you couldn't find a good way to remove failures that have been fixed. Do you think that this would work?:
Once per hour (or so) get the list of SOAP failures, using the same algorithm as you use for Special:Soapfailures, but get 2-3 pages worth. Check that list of song for successes, remove successful songs from the failures table, and then generate the cache entry for the special page.
This should keep the cost a constant amount, regardless of the traffic to the site, will only process previously failed songs, and handles the page caching.
- - teknomunk (talk,E) 12:45, 7 January 2007 (EST)
Community content is available under Copyright unless otherwise noted.