Simplepie vs. Magpie: A RSS Parser shootout (Updated!)
I parse RSS a lot. My newsbot (that automatically finds new newssources) parses around 12000 feeds each day and heavily filters results via bayes and hidden markov. If i can shed some seconds runtime somewhere i’m willing to try it, because the scripts run on a 266mhz P2 machine (that consumes nearly no power at all).
After reading all the buzz about Simplepie (”Faster than a speeding bullet”) i decided to give it a try. After trying to break both pies (using invalid feeds, invalid unicode and so on) i decided to benchmark them. I tested Magpie 0.72 and Simplepie 1.0b2.
Magpie has sometimes problems with unicode, seems to have no documentation at all, parses even really fucked-up feeds, and is widely used. And you can feed it raw text, not only URLs. It uses the widly-used Snoopy-Class to get http-data. Caches serialized data als file (filename is the md5-encoded feed-url).
Simplepie seems to cope with unicode in all cases, has a bloated but existing documentation and seems to parse most trash i gave it. It can only work on an URL-basis, no chance to use proxies or Last-Modified tricks. Uses fopen(). Caches serialized data as file (filename is the urlencoded feel-url). Has been hyped on digg and elsewhere.
The Test
These are the two test-files i used (exported from the newsbot):
These are the two test-scripts i ran:
Results
|
| Magpie |
Simplepie |
| test1 |
25 seconds |
??? seconds |
| test2 |
5 seconds |
67 seconds |
I stopped test1 for Simplepie after 5 minutes, because it consumed way to much memory and hogged my machine heavily.
Looks like Simplepie is more like “Slower than a mule on dope”. Magpie is so much faster than Simplepie, it disqualifies Simplepie. I wonder if any person that hyped Simplepie as turbo-fast has ever tried to benchmark it against Magpie. Try it yourself, all needed files are above. These results are uncached, i did not compare cached results because both classes use the same cacheing method.
I tried to rip some code from magpie and simplepie, in the hope i could combine them with my own parser to have something fast and unicode-reliable. But after some minutes thought (while cooking) i started Sharpdevelop, wrote some C# code to replace my php-cronjob entirely (that parses, filters, accesses mysql and so on) and now i’m left with a cron-job that has no problems with unicode at all (and that without using dirty tricks) and runs about 25 seconds and is started every 5 minutes (instead of 5++ minutes runtime every 20 minutes). I should have done this months before.
UPDATE!
The authors of simplepie reacted to my test.
Let's check
their claims, one by one, against what reality tells us.
A blog posting at http://codeninja.de/[...] caused some FUD to be spread around about
SimplePie's performance.
FUD ("Fear, uncertainty, doubt") is a marketing strategy (see
Wikipedia),
thus they are probably claiming that i wrote my article only to make simplepie look bad.
The only reason for me to write this article was to comment on the disproportionate
claims on the simplepie-website ("Faster than a speeding bullet"). If you claim to be
this fast, you have to be faster than any competitor, and not slower. Or you should
think about changing your over-the-top marketing arguments.
The feeds he DID test were remarkably "jacked-up"...
to the point that you're not likely to come across in 99% of test cases.
The testdata is a totally random cut from the rss-data you get when you visit these feeds:
http://technorati.com/tag/ninja
http://del.icio.us/tag/ninja
http://news.google.com/news?q=ninja&ie=UTF-8
http://blogmarks.net/tag/ninja
http://search.msn.de/results.aspx?FORM=MSNH&CP=1252&q=ninja
http://search.yahoo.com/search?p=ninja
http://blogs.icerocket.com/search?q=ninja
http://feedster.com/search.php?q=ninja&sort=date&ie=UTF-8&hl=&content=full&&limit=15
http://www.blogdigger.com/search?q=ninja
http://www.plazoo.com/search/ninja.htm
http://blogg.de/tag/ninja.htm
http://www.findarticles.com/p/search?qt=ninja&qf=free&tb=art
http://www.furl.net/furled.jsp?topic=ninja
http://flickr.com/photos/tags/ninja
http://blogsearch.google.com/blogsearch?hl=en&q=ninja&btnG=Search+Blogs
http://video.google.com/videosearch?q=ninja+is%3Afree&page=1&lv=0&so=0
I would say that this
very much reflects the reality a parser has to
face in the depths of the internet. I don't know why they try to claim that
this data is so extremly special, that you come across it only in 1% of all
feeds. To me it even sounds like they are suggesting i created this data
only to make simplepie look bad (FUD you know?).
Simply aggregate the listed urls too, pick some random entries, and you will end with a
test-feed like i generated. I repeat: this is what reality gives us.
This time i even included a copy of kottke.org's feed (
copy)
to disprove that my feed-examples were too jumbled.
He only tested pure speed between SimplePie and MagpieRSS... nothing else (there are
other important factors besides speed alone).
And the factors are? Both classes use php-arrays to access the data (after parsing)
iconv and stuff to convert the data (while parsing), use serialization for cacheing and so on.
I'm really interested what
dark magic is lurking in the depths of magpie or simplepie that can heavily influence
the footprint of a normal usage (that is: parsing/reading cache and output it/write it
to a database).
I'd also like to see the shootout between more than just SimplePie and MagpieRSS. If we
want to stick to PHP, we could go with SimplePie, MagpieRSS, CaRP/Grouper, and lastRSS.
I did include lastrss this time (
testfile),
but not CaRP, because the author of CaRP wants me to subscribe to
"7 ways to turn RSS into R$$" (which i'm
surely not doing) before i can download it.
I'm very shure that lastrss is the very top of parsingspeed
you can achive, because it's pretty minimalistic and featureless.
Results (again)
Anyway, i benchmarked again. And i can't verify the numbers they are listing on their
website. I used the same php+rss files listed above on a (100% idle) machine with PHP5.2
and below are the results i keep getting.
| simplepie_1.0beta3 | magpie0.72 | lastrss0.9.1
| | rss1 | 15.539 seconds 1.601.120 bytes | 2.327 seconds 1.037.576 bytes | 1.967 seconds 772.432 bytes |
rss2 | 1.677 seconds 170.512 bytes | 0.439 seconds 144.168 bytes | 0.346 seconds 123.216 bytes |
kottke | 810 ms 73.120 bytes | 157 ms 76.400 bytes | 88 ms 69.720 bytes |
To be honest: This was to be expected. Seeing massive-speedimprovements
in a software that stepped up one beta-notch is not very common. And
this time i dont repeat myself, i'm takeing the unusual step in saying
DON'T TRUST MY NUMBERS, TRY IT YOURSELF!.
Even if SimplePie sucks in an area, it would let us know what areas to work on, so having a
real, valid, FUD-less shootout would be in our best interests.
Yes. You should work on your code instead of claiming that the test-data was unfair,
or trying to damage my credibility by claiming the main aim of this article was to
spread FUD about simplepie.
My other projects
Comments
[ ReaderX | 05.02.07 18:20 ]What is the date of this post? No idea if it's current or ancient.
[ Justin | 11.03.07 02:40 ]Thanks for the great research. Great article. Just look at the versions of the software and you can tell how current this article is.
[ mnt | the codeninja, 12.03.07 21:54 ]As of March 2007:
Magpie: here=0.72, current=0.72
Simplepie: here=1.0b2, current=1.0b3
[ Ryan Parman | 29.03.07 00:26 ]To help combat some of the FUD that this article has caused for SimplePie, I've put together a few tests comparing SimplePie (Beta 2, Beta 3, and trunk) and MagpieRSS (0.72, 2.0-alpha-alpha-alpha).
Hopefully this will allow people to see a better comparison between the two. In truth, the performance is comparable, yet SimplePie offers better compatibility, better standards support, and more flexibility via optional configuration.
http://php5.simplepie.org/test/codeninja.de/
[ mnt | the codeninja, 31.03.07 23:56 ]Ryan: I responded to your claims, see above. Looks like simplepie is as slow as claimed previously, no improvements with beta3.
[ Jose Ferrer | 05.04.07 03:04 ]Thanks. I was thinking of using SimplePie in a major project.
This article was extremely helpful.
[ Geoffrey Sneddon | 09.04.07 18:34 ]Running test tests myself, MagpieRSS returns 475 items, whereas SimplePie returns 500 items. I also, at the slowest, didn't find 1.0b3 to be <em>that</em> slow, but rather only 3 times slower with the stock settings (if you turn off features that MagpieRSS doesn't have, and add in things to the MagpieRSS test script for things that can't be turned off (eg. htmlspecialchars() on textual content such as RSS <title> elements), the difference is smaller still).
As for the claims that it is unfair: how many real world sites have 500 item feeds? The vast majority by a heckuva long way have 10 items.
[ nicola | 14.04.07 00:08 ]i also try it and found that yes simplepie is slower. my speculation is that he tried 500 items to find who parses faster, like, parsing 50 feeds with 10 items.
[ mnt | the codeninja, 15.04.07 17:00 ]Geoffrey: Which features did you turn off?
Anyway: If you are trying to tell me that simplepie is only 3 times slower than magpie if you turn certain extra-features off... isn't this missing the whole point? Something that is advertised as "faster than magpie" and "faster than a speeding bullet" should really be faster and not walk out of an shooutout where it's victory is measured in "times slower" (=multiples of 100%).
[ Neo | 31.05.07 20:41 ]great research, though u have put me in a new mental phase: which one to go for..... i have no idea.
i m trying to intergrate either magpie or simplepie in a web appliaction (great work though)) but seems ill have to test them both myself for reliability and speed.
[ mnt | the codeninja, 01.08.07 17:28 ]What about... http://pear.php.net/packages.php?catpid=22&catname=XML
[ Jerry | 11.08.07 01:25 ]Okay, I know I just completely jacked this thread, and I do appreciate this research, but I find it hilarious that when I searched in Google for "Simple Pie Versus" the second listing was "Mrs.Smith's Pumpkin Pie Versus Semi Homemade Pumpkin Pie" . . . Hmm. I wonder if Mrs. Smith;s pie crust rises faster or slower than the generic competition's. I think it's time for a test . . .
[ kad | 02.09.07 09:48 ]Just did a quick re-match using simplepie 1.0.1 and magpie 0.72, download and see for yourself: http://kad.blegh.net/data/files/jaks/simple_vs_mag.tar.bz2
[ kad | 02.09.07 09:49 ]... or you could just try it right now in this temp page:
http://labs.blegh.net/simple_vs_mag/
[ Lourenzo Ferreira | 02.10.07 10:59 ]... Qual a melhor torta de feeds? MagPie ou SimplePie?
[ David | 15.12.07 16:06 ]Thanks for the nice research and good read.
[ Mark | 10.01.08 05:19 ]I plan to replace MagpieRSS in my application, so I redo the test, here come the results:
simplepie(first time):
Time: 7.50597secs
Peak Memory Usage: 6667456 bytes
MagpieRSS(first time):
Time: 17.103986 secs
Peak Memory Usage: 2298320 bytes
simplepie(second time with cache):
Time: 0.027497 secs
Peak Memory Usage: 4367264 bytes
MagpieRSS(second time with cache):
Time: 0.00862 secs
Peak Memory Usage: 1910264 bytes
The results looks promising. If I can find a way to reduce the memory usage, I will move to SimplePie without any thoughts....
It not, then I will stick to MagpieRSS longer :(
* The version of MagpieRSS is 0.72 and version of SimplePie is 1.1
** I put the test1.rss under my localhost to reduce the fetching time.
*** test1.rss seems has some problem, I have to re-save it to make SimplePie can parse it.
**** I turn on SimplePie dirty fast option, so it works more close to MagpieRSS.
Mark
[ Mark | 10.01.08 05:23 ]Sorry, not "dirty fast" option, is "s t u p i d l y fast" option.
** I have to put space inside the word, or it will filtered ...
Mark
[ Mark | 10.01.08 05:43 ]Another mistake. The MagpieRSS(first time) result is wrong. The following is the correct one:
MagpieRSS(first time):
Time: 12.83128
Peak Memory Usage: 2340528 bytes
Mark
[ zack2zack | 23.01.08 08:56 ]LASTRSS roxs ... fulfills my needs :)
I have also use Refeed using magpie engine gives very satisfying result.Going to try SimplePie soon..
Thanks codeninja and all authors of SimplePie, MagpieRSS, and lastRSS.. your efforts & time ... really appreciated.
Zack
[ mnt | the codeninja, 28.01.08 21:21 ]http://www.google.com/search?q=failed+in+%2Fsimplepie.inc&ie=utf-8&oe=utf-8
omg.
[ amado | 12.02.09 04:59 ]While working on a project for a client. I went with Magpie but it failed due to some unicode issues. I then tried simplepie, but it even failed to parse the rss feed (3MB feed) and crashed my script due to memory limits. I raised the memory limit and .. guess what.. now it takes longer to crash.
Add comment