torkell: (Default)
[personal profile] torkell
I'd forgotten just how primitive the string parsing in C is.

One bug in Semagic is to do with the music detection for XMPlay: if a track has an Artistsort field and that happens to be returned before the Artist field, then Semagic will pick up that one instead. Actually it's more fun than that, because it'll actually include a few extra characters and return something like "ort Enigma".

XMPlay provides this information in one massive string where each tag is on its own line and has a tab separating the name and the value. Semagic searches for a tag by comparing the start of each line against the tag name, which is where the bug occurs as if you're only comparing the first 6 characters then "artist" and "artistsort" are equal.

Now, if I was coding this up from scratch I'd parse the entire string and dump it into a hash map. Probably with something like this if I was using Java:

String[] lines = input.split('\n');
for (String line in lines) {
  String[] tokens = line.split('\t', 2);
  if (tokens.length == 2) {
    tagMap.put(tokens[0], tokens[1]);
  }
}


But unfortunately I'm using C and so have to roll my own string manipulation. I also have to do my own memory management which is even more annoying... and I can't be bothered with all that malarkey, so I'll try bodging a fix by checking if the field name is followed by a tab. A quick recompile... and it works!

If anyone's interested, here's the patch (apply it against v1.8.0.1):


Huh. I've spotted another bug - Semagic picks the first matching entry, but there can be multiple ones. In particular if a MP3 file has both ID3v1 and ID3v2 tags then it'll find the ID3v1 tag which could very well be truncated (because somebody thought 30 characters would be enough for anybody). That'll be much trickier to fix without rewriting the string parsing.

Date: 2016-10-13 05:06 am (UTC)
From: [identity profile] olego.livejournal.com
-if (!_tcsncicmp(token,p,token_len) && !_tcsncmp(p + token_len, _T("\t"), 1)){
+if (!_tcsncicmp(token,p,token_len) && p[token_len] == _T('\t')) {

A bit more legible ;-).

Date: 2016-10-13 06:28 pm (UTC)
From: [identity profile] olego.livejournal.com
In the Visual C++ team, we're still fairly close to the hardware. C++ is now a lot better about strings, it has the class string with a .length() function, but it's still fairly clumsy to do things like string.replaceAll:

http://stackoverflow.com/questions/5343190/how-do-i-replace-all-instances-of-a-string-with-another-string

And super-clumsy to do =~ s/A/B/:

http://www.cplusplus.com/reference/regex/regex_replace/

January 2026

S M T W T F S
     123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 9th, 2026 06:06 am
Powered by Dreamwidth Studios