Author Topic: FreeBASIC CWstr  (Read 12559 times)

0 Members and 1 Guest are viewing this topic.

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 26
Re: FreeBASIC CWstr
« Reply #195 on: April 29, 2019, 01:32:48 AM »
I've started a discussion at https://www.freebasic.net/forum/viewtopic.php?f=17&t=27569, which includes a back link to here.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 114
Re: FreeBASIC CWstr
« Reply #196 on: April 29, 2019, 11:24:40 PM »
Jeff,


Quote
FYI, hides the problem, doesn't solve it.

i agree! It makes the compiler work, but it doesnīt SOLVE the problem of the following code. Investigating this bug further shows, that "chars" after
Code: [Select]
chars = fb_wstr_CalcDiff( src, p ) + 1;
gets a value of -2147483648 where you would expect zero (p being one position "before" src should result in -1, +1 should make zero). So it seems "fb_wstr_CalcDiff" doesnīt return the expected result, if p < src.


Adding
Code: [Select]
  if (p < src)
    return NULL;

before makes it exit and return NULL, when src is all spaces. This would be expected. But "fb_wstr_CalcDiff" is used in other places as well, i didnīt look, if this (end < ini) could be a problem elsewhere too. Hope this helps...


JK

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 114
Re: FreeBASIC CWstr
« Reply #197 on: April 30, 2019, 10:59:01 PM »
Investigating this case still further, i can say that end < ini cannot occur in all other places, where fb_wstr_CalcDiff is called.

I understand the underlying logic of the code in strw_trim.c and as far as i can tell, it is correct. What isnīt correct is the result of fb_wstr_CalcDiff in case p < src. Fb_wstr_CalcDiff should return the difference between the pointers p and scr in wide characters. The corresponding code is:

Code: [Select]
return ((intptr_t)p - (intptr_t)src) / sizeof( FB_WCHAR );
As said before, this delivers correct results as long as p>src, but for p<src it fails to deliver the expected result. Iīm not an expert in C coding at all, but i keep asking myself - why this compilcated? Wouldnīt

Code: [Select]
chars = p - src + 1;
do instead of fb_wstr_CalcDiff? ... and according to my tests it does, even if p<src.


JK

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 26
Re: FreeBASIC CWstr
« Reply #198 on: May 02, 2019, 10:41:47 PM »
JK, thanks for investigating.  You have the right idea.

I think your expression: chars = p - src + 1;, is valid and safe.  gcc should be doing the right thing with the pointer arithmetic.

fb_wstr_CalcDiff provides some self documentation as to what the code is supposed to be doing, so might be good to keep the function.   
Code: [Select]
/* Calculate the number of characters between two pointers. */
static __inline__ ssize_t fb_wstr_CalcDiff( const FB_WCHAR *ini, const FB_WCHAR *end )
{
return end - ini;
}

For your interest:
fb_wstr_SkipCharRev function itself can return a pointer to one element before the first element, which is actually undefined behaviour in C, but often works because of how the C compiler handles it.  So I think I will make a small change in the logic for fb_wstr_SkipCharRev to avoid that and to get rid of the '+1' needed after the function is called.  The result will be that fb_wstr_CalcDiff won't ever see "end" pointer less than "ini" pointer to begin with. 

And, in the the original expression:
Code: [Select]
return ((intptr_t)end - (intptr_t)ini) / sizeof( FB_WCHAR );the symptoms of the problem we are seeing is in how gcc optimizes the expression by translating a division by a power of 2 to a shift instructions.  fbc does this too, but in gcc, the actual optimization is different depending on type of the divisor:
Code: [Select]
((int)end - (int)ini) / sizeof(FB_WCHAR)      '' gcc optimizes to SHR instruction (wrong!)
((int)end - (int)ini) / (int)sizeof(FB_WCHAR) '' gcc optimizes to SAR instruction (correct)
SHR instruction doesn't preserve the sign and so fails for negative values, as would be the case in (end < ini).

I don't see any reason why fb_wstr_CalcDiff must cast pointers to integers.  But that code is nearly the same since 2005, so maybe related to an old version of gcc, I don't know.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 114
Re: FreeBASIC CWstr
« Reply #199 on: May 02, 2019, 11:58:18 PM »
Jeff,

Code: [Select]
fb_wstr_CalcDiff provides some self documentation as to what the code is supposed to be doing, so might be good to keep the function.
this is, why i like line comments (comments added at the end of a line) so much. It doesnīt distract the eye when reading the code, you can easily ignore the text in green (or whatever color you prefer for comments) especially if it is properly aligned. But in case you donīt understand whatīs going on and why, such a comment will save you a lot of time, if you must revisit this code location later on.

My memory is still working quite well, but iīm not sure, if i can recall all of i know right now about this code in letīs say in 6 or twelve month. Things, which are obvious right now, will become obscure over time ...

Code: [Select]
chars = p - src + 1;                                                //calculate the difference in "characters"
solves this problem for good.

Interspersed comments should be used for important information. Too many of them make the code hard to read. But those aligned comments on the right side donīt do any harm and are quite useful for explaining things, which are of lower priority but nevertheless useful for understanding the code flow.

Analyzing someone elseīs code is always a challenge. Adding comments to places, where you finally understand, what the code does, helps preserving your effort. In other words i would very much appreciate, if you accepted adding line comments to the sources. I think many people interested in participating would benefit from more readable sources. As you can see from the work i have done, itīs not impossible to understand the sources, but in order to get the whole picture more comments and explanations would help a lot.

Please think about it,


JK

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 26
Re: FreeBASIC CWstr
« Reply #200 on: May 03, 2019, 12:34:29 AM »
The name alone, fb_wstr_CalcDiff, without any other comment or information is the documentation.  Any time this name appears, it can be known that the expression is taking difference of 2 wstr pointers.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 114
Re: FreeBASIC CWstr
« Reply #201 on: May 03, 2019, 02:59:48 PM »
Well, just coding "p - src" would make this one-line function "fb_wstr_CalcDiff" obsolete and would additionally save us the overhead of a function call. The downside is, that the information provided by the descriptive function name is lost then too - a comment would help here.

I donīt want to argue, but i think you get my point!


Just another topic we need a decision for is STRPTR. The way i coded it requires USTRING (or any clone of it) to have the data pointer in first place of the member variables. We could just make this a requirement. But what if someone doesnīt comply to this rule? Unstable, erratic and maybe crashing code would be the (undesirable) result.

So a much cleaner solution would be to make STRPTR an overloadable operator. Then specific code for returning a WSTRING PTR for the STRPTR operator had to be supplied or a compiler error (Invalid data type) would be thrown.

What do you think?


JK


later: in the meantime i know how to do it (STRPTR as overloaded operator) - works like a charm. I really think itīs better than relying on the dataīs position.
« Last Edit: May 03, 2019, 07:08:57 PM by Juergen Kuehlwein »

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 26
Re: FreeBASIC CWstr
« Reply #202 on: May 04, 2019, 05:30:52 AM »
> just coding "p - src" would make this one-line function "fb_wstr_CalcDiff" obsolete and would additionally save us the overhead of a function call

true, fb_wstr_CalcDiff has all the properties of a function: a name, parameters, type checking, a typed return value.  Except this one (like many of the functions in fb_unicode.h) is an inline function.  gcc will optimize fb_wstr_CalcDiff in line with code where it is used just 2 assembler instructions.

> So a much cleaner solution would be to make STRPTR an overloadable operator.

I think an overloadable operator was one of my thoughts also a few posts ago.  But, in the end though, STRPTR is a lot like a cast() as wstring ptr.  And one of the bugs from sf.net talks about allowing both cast() byref as wstring, cast() as wstring ptr, etc, all in one UDT, changing how fbc ranks string type matching to allow more user defined conversions.  Which made me think if we should have separate WSTRPTR & STRPTR, like we have separate WSTR and STR.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 114
Re: FreeBASIC CWstr
« Reply #203 on: May 04, 2019, 01:10:14 PM »
gcc optimization - ok, i donīt know anything about it, except that it exists.

I will have my own set of sources (with comments) and write a little utility for stripping these comments before pushing a file to the repository. This way we can have both, i can have as much comments as i like and the sources are kept "clean". Merging or rebasing will be more cumbersome then for me, but this is my problem.


WSTRPTR - i donīt see an absolute need for this, because in case of an STRPTR operator it is clear, that i want a pointer and it isnīt a matter of fbc ranking, but a matter of being defined as operator or not in an UDT. On the other hand, there already are some W... functions in FB, and WSTRPTR would make clear what kind of pointer is expected. So STRPTR alone would be sufficient, but WSTRPTR would be a clearer syntax - the more i think about it, the more i like it!

Quote
to allow more user defined conversions

I think in case of STRING/WSTRING this can cause only trouble. Where is the need for returning a STRING AND a WSTRING from the same UDT? Internally itīs an either - or, either handle the data as STRING - or handle the data as WSTRING. If the data internally is a WSTRING, converting it to STRING might spoil this data, because this isnīt a lossless conversion. The other way round would work, but it could easily be done by the user too. So allowing for returning both, might be a problem, because it can cause hard to find (WSTRING to STRING conversion under the hood) errors.


Let me know, if you want to have the code changes for making STRPTR an overloadable operator. There isnīt very much to do, just a few more lines here and there. Currently i cannot just push the sources, because i tried many things in many different places, so it isnīt obvious, what is for what.


JK