Author Topic: FreeBASIC CWstr  (Read 13693 times)

0 Members and 1 Guest are viewing this topic.

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #195 on: April 29, 2019, 01:32:48 AM »
I've started a discussion at https://www.freebasic.net/forum/viewtopic.php?f=17&t=27569, which includes a back link to here.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #196 on: April 29, 2019, 11:24:40 PM »
Jeff,


Quote
FYI, hides the problem, doesn't solve it.

i agree! It makes the compiler work, but it doesnīt SOLVE the problem of the following code. Investigating this bug further shows, that "chars" after
Code: [Select]
chars = fb_wstr_CalcDiff( src, p ) + 1;
gets a value of -2147483648 where you would expect zero (p being one position "before" src should result in -1, +1 should make zero). So it seems "fb_wstr_CalcDiff" doesnīt return the expected result, if p < src.


Adding
Code: [Select]
  if (p < src)
    return NULL;

before makes it exit and return NULL, when src is all spaces. This would be expected. But "fb_wstr_CalcDiff" is used in other places as well, i didnīt look, if this (end < ini) could be a problem elsewhere too. Hope this helps...


JK

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #197 on: April 30, 2019, 10:59:01 PM »
Investigating this case still further, i can say that end < ini cannot occur in all other places, where fb_wstr_CalcDiff is called.

I understand the underlying logic of the code in strw_trim.c and as far as i can tell, it is correct. What isnīt correct is the result of fb_wstr_CalcDiff in case p < src. Fb_wstr_CalcDiff should return the difference between the pointers p and scr in wide characters. The corresponding code is:

Code: [Select]
return ((intptr_t)p - (intptr_t)src) / sizeof( FB_WCHAR );
As said before, this delivers correct results as long as p>src, but for p<src it fails to deliver the expected result. Iīm not an expert in C coding at all, but i keep asking myself - why this compilcated? Wouldnīt

Code: [Select]
chars = p - src + 1;
do instead of fb_wstr_CalcDiff? ... and according to my tests it does, even if p<src.


JK

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #198 on: May 02, 2019, 10:41:47 PM »
JK, thanks for investigating.  You have the right idea.

I think your expression: chars = p - src + 1;, is valid and safe.  gcc should be doing the right thing with the pointer arithmetic.

fb_wstr_CalcDiff provides some self documentation as to what the code is supposed to be doing, so might be good to keep the function.   
Code: [Select]
/* Calculate the number of characters between two pointers. */
static __inline__ ssize_t fb_wstr_CalcDiff( const FB_WCHAR *ini, const FB_WCHAR *end )
{
return end - ini;
}

For your interest:
fb_wstr_SkipCharRev function itself can return a pointer to one element before the first element, which is actually undefined behaviour in C, but often works because of how the C compiler handles it.  So I think I will make a small change in the logic for fb_wstr_SkipCharRev to avoid that and to get rid of the '+1' needed after the function is called.  The result will be that fb_wstr_CalcDiff won't ever see "end" pointer less than "ini" pointer to begin with. 

And, in the the original expression:
Code: [Select]
return ((intptr_t)end - (intptr_t)ini) / sizeof( FB_WCHAR );the symptoms of the problem we are seeing is in how gcc optimizes the expression by translating a division by a power of 2 to a shift instructions.  fbc does this too, but in gcc, the actual optimization is different depending on type of the divisor:
Code: [Select]
((int)end - (int)ini) / sizeof(FB_WCHAR)      '' gcc optimizes to SHR instruction (wrong!)
((int)end - (int)ini) / (int)sizeof(FB_WCHAR) '' gcc optimizes to SAR instruction (correct)
SHR instruction doesn't preserve the sign and so fails for negative values, as would be the case in (end < ini).

I don't see any reason why fb_wstr_CalcDiff must cast pointers to integers.  But that code is nearly the same since 2005, so maybe related to an old version of gcc, I don't know.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #199 on: May 02, 2019, 11:58:18 PM »
Jeff,

Code: [Select]
fb_wstr_CalcDiff provides some self documentation as to what the code is supposed to be doing, so might be good to keep the function.
this is, why i like line comments (comments added at the end of a line) so much. It doesnīt distract the eye when reading the code, you can easily ignore the text in green (or whatever color you prefer for comments) especially if it is properly aligned. But in case you donīt understand whatīs going on and why, such a comment will save you a lot of time, if you must revisit this code location later on.

My memory is still working quite well, but iīm not sure, if i can recall all of i know right now about this code in letīs say in 6 or twelve month. Things, which are obvious right now, will become obscure over time ...

Code: [Select]
chars = p - src + 1;                                                //calculate the difference in "characters"
solves this problem for good.

Interspersed comments should be used for important information. Too many of them make the code hard to read. But those aligned comments on the right side donīt do any harm and are quite useful for explaining things, which are of lower priority but nevertheless useful for understanding the code flow.

Analyzing someone elseīs code is always a challenge. Adding comments to places, where you finally understand, what the code does, helps preserving your effort. In other words i would very much appreciate, if you accepted adding line comments to the sources. I think many people interested in participating would benefit from more readable sources. As you can see from the work i have done, itīs not impossible to understand the sources, but in order to get the whole picture more comments and explanations would help a lot.

Please think about it,


JK

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #200 on: May 03, 2019, 12:34:29 AM »
The name alone, fb_wstr_CalcDiff, without any other comment or information is the documentation.  Any time this name appears, it can be known that the expression is taking difference of 2 wstr pointers.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #201 on: May 03, 2019, 02:59:48 PM »
Well, just coding "p - src" would make this one-line function "fb_wstr_CalcDiff" obsolete and would additionally save us the overhead of a function call. The downside is, that the information provided by the descriptive function name is lost then too - a comment would help here.

I donīt want to argue, but i think you get my point!


Just another topic we need a decision for is STRPTR. The way i coded it requires USTRING (or any clone of it) to have the data pointer in first place of the member variables. We could just make this a requirement. But what if someone doesnīt comply to this rule? Unstable, erratic and maybe crashing code would be the (undesirable) result.

So a much cleaner solution would be to make STRPTR an overloadable operator. Then specific code for returning a WSTRING PTR for the STRPTR operator had to be supplied or a compiler error (Invalid data type) would be thrown.

What do you think?


JK


later: in the meantime i know how to do it (STRPTR as overloaded operator) - works like a charm. I really think itīs better than relying on the dataīs position.
« Last Edit: May 03, 2019, 07:08:57 PM by Juergen Kuehlwein »

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #202 on: May 04, 2019, 05:30:52 AM »
> just coding "p - src" would make this one-line function "fb_wstr_CalcDiff" obsolete and would additionally save us the overhead of a function call

true, fb_wstr_CalcDiff has all the properties of a function: a name, parameters, type checking, a typed return value.  Except this one (like many of the functions in fb_unicode.h) is an inline function.  gcc will optimize fb_wstr_CalcDiff in line with code where it is used just 2 assembler instructions.

> So a much cleaner solution would be to make STRPTR an overloadable operator.

I think an overloadable operator was one of my thoughts also a few posts ago.  But, in the end though, STRPTR is a lot like a cast() as wstring ptr.  And one of the bugs from sf.net talks about allowing both cast() byref as wstring, cast() as wstring ptr, etc, all in one UDT, changing how fbc ranks string type matching to allow more user defined conversions.  Which made me think if we should have separate WSTRPTR & STRPTR, like we have separate WSTR and STR.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #203 on: May 04, 2019, 01:10:14 PM »
gcc optimization - ok, i donīt know anything about it, except that it exists.

I will have my own set of sources (with comments) and write a little utility for stripping these comments before pushing a file to the repository. This way we can have both, i can have as much comments as i like and the sources are kept "clean". Merging or rebasing will be more cumbersome then for me, but this is my problem.


WSTRPTR - i donīt see an absolute need for this, because in case of an STRPTR operator it is clear, that i want a pointer and it isnīt a matter of fbc ranking, but a matter of being defined as operator or not in an UDT. On the other hand, there already are some W... functions in FB, and WSTRPTR would make clear what kind of pointer is expected. So STRPTR alone would be sufficient, but WSTRPTR would be a clearer syntax - the more i think about it, the more i like it!

Quote
to allow more user defined conversions

I think in case of STRING/WSTRING this can cause only trouble. Where is the need for returning a STRING AND a WSTRING from the same UDT? Internally itīs an either - or, either handle the data as STRING - or handle the data as WSTRING. If the data internally is a WSTRING, converting it to STRING might spoil this data, because this isnīt a lossless conversion. The other way round would work, but it could easily be done by the user too. So allowing for returning both, might be a problem, because it can cause hard to find (WSTRING to STRING conversion under the hood) errors.


Let me know, if you want to have the code changes for making STRPTR an overloadable operator. There isnīt very much to do, just a few more lines here and there. Currently i cannot just push the sources, because i tried many things in many different places, so it isnīt obvious, what is for what.


JK

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #204 on: June 01, 2019, 10:58:08 PM »
In the end, I made STRPTR(wstring) return a WSTRING PTR.  I couldn't think of any reason it should not.  To quote the fb wiki, "Note that when passed a Wstring, Operator Strptr still returns a Zstring Ptr, which may not be the desired result.".  My thought is, would it EVER be the desired result.  So I think this is a safe change.

Besides, STR(ustring), and WSTR(ustring) can still be used for specific conversions.

I feel like I have made progress over last several weekends.  I wrote a lot of tests.  The MID assignment statement was a weird one, and I spent  some time on that one.  Maybe it's bugged, I don't know, it has some peculiar behaviours.  For now I left myself a note about it.

I implemented WSTR(ustring) in a way that is close to WSTR(wstring).  When testing ustring = WSTR(ustring), I came across an issue in your implementation of the DWSTR in ustring.bi; DECLARE OPERATOR LET (BYREF pwszStr AS WSTRING PTR) clears the buffer.  And if pwszStr ptr actually points to itself, the buffer is cleared before the contents are read.  Some kind of memory-move would be better.

I think the remaining parts for me are:
- LEFT/RIGHT, which involves fixing a string related bug
- SELECT CASE, which is probably OK
- IIF, which is probably OK, with efficient logic decision
- SWAP, which I need to think about a bit.
- Parameter passing, which is probably OK, so just needs the tests written.

I think I will create a pull request for the work done so far to date.  I'll expand on next steps over at fb.net some time this weekend.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #205 on: June 02, 2019, 11:02:15 PM »
Jeff,


i read your post in FB-forum (please read my post there too) first and replied there first. So whatīs new is the problem with MID (could you supply code, where it shows "some peculiar behaviors"?) and that ustring = WSTR(ustring) fails. To be honest i never tested that, because such code doesnīt make much sense. On the other hand itīs not forbidden and therefore it should work! I will have a look.

José, when you read this, could you have a look too?

As far as i read your changes i like your approach better than mine. My goal was to show, that it is possible, and to make it work (somehow). Your code (no wonder) is a more logical development of the compiler.


JK

   

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #206 on: June 03, 2019, 01:11:08 AM »
> So whatīs new is the problem with MID (could you supply code, where it shows "some peculiar behaviors"?)

In general, the WSTRING in rtlib support is not optimized for speed.  So for most wstring functions, there is no version that will take a length parameter, and length is always calculated with a wstrlen().  Except for MID() assignment, which always assumes the buffer is large enough, if it's a pointer (not fixed length type).  Creates some situations to watch out for:
Code: [Select]
sub print_wstr( byref s as wstring )
var n = len(s)
print "    " & left( str(s) & space(30), 30);
for i as integer = 0 to n
print hex( s[i], 4 ) & " ";
next
print
end sub

scope
print "show a string"
dim w as wstring * 16 = "abcdefgh"
print_wstr w
print
end scope

scope
print "mid(wstring*n,,) overwrites null terminator"
'' initializer doesn't clear string
dim w as wstring * 16 = "Q"
print_wstr w

'' overwrites null terminator and we get garbage
mid( w, 2, 1 ) = "X"
print_wstr w
print
end scope

scope
print "mid(wstring,,) overwrites null terminator"
dim w as wstring * 16 = "R"

'' try a pointer, it doesn have fixed length limit
dim p as wstring ptr = @w
print_wstr *p

'' overwrites null terminator and we get garbage
mid( *p, 2, 1 ) = "Y"
print_wstr *p
print
end scope

scope
print "mid(wstring,) writes data beyond end of string"
dim t as wstring * 16
dim w as wstring * 16 = "0123456789abcde"
print_wstr w

dim p as wstring ptr = @w
mid( *p, 16 ) = "qrst"

print_wstr w
print_wstr t
print
end scope

The rtlib needs work with new functions to handle string length parameters.  Which should give better speed and allow for embedded null characters.  Both of which is needed for the eventual addition of builtin dyanmic wstring. 
« Last Edit: June 03, 2019, 01:34:40 AM by Jeff Marshall »

Offline Jeff Marshall

  • Newbie
  • *
  • Posts: 31
Re: FreeBASIC CWstr
« Reply #207 on: June 03, 2019, 01:15:50 AM »
> ustring = WSTR(ustring) fails

In one of your tests (tests/ustring/wstr.bas) you have:
Code: [Select]
      dim w as wstring * 50 = wchr( 1234 )
      dim u as ustring = wchr( 1234 )
      w = wstr(w)
      u = wstr(u)

In your original fbc ustring code, rtlToWstr() doesn't do anything if the argument is a "USTRING".  So it just resolves to u = u and so it calls a let operator that does check buffer address:
Code: [Select]
PRIVATE OPERATOR DWSTR.Let (BYREF cws AS DWSTR)
  IF m_pBuffer = cws.m_pBuffer THEN EXIT OPERATOR   ' // Ignore cws = cws
  this.Clear
  this.Add(cws)
END OPERATOR

In my implementation of WSTR(udt), it actually does the conversion, and so calls:
Code: [Select]
PRIVATE OPERATOR DWSTR.Let (BYREF pwszStr AS WSTRING PTR)
  this.Clear
  IF pwszStr = 0 THEN EXIT OPERATOR
  this.Add(*pwszStr)
END OPERATOR
if pwszStr points to m_pBuffer, (or first couple of bytes, looking at Clear), So the string just gets erased before the data is copied.

Offline José Roca

  • Administrator
  • Hero Member
  • *****
  • Posts: 2516
    • José Roca Software
Re: FreeBASIC CWstr
« Reply #208 on: June 03, 2019, 01:34:49 PM »
Quote
The rtlib needs work with new functions to handle string length parameters.  Which should give better speed and allow for embedded null characters.  Both of which is needed for the eventual addition of builtin dyanmic wstring.

You seem to be thinking about BSTRings, that carry its length with them. They're a different data type. FreeBasic's WSTRING is a null terminated unicode string (maybe it should have been named ZWSTRING or WSTRINGZ to avoid confussions); therefore, they can't have embedded nulls.

Offline Juergen Kuehlwein

  • Full Member
  • ***
  • Posts: 116
Re: FreeBASIC CWstr
« Reply #209 on: June 03, 2019, 10:52:54 PM »
Jeff,

so, if you  add this line:
Code: [Select]
IF m_pBuffer = cast(ubyte ptr, pwszStr) THEN EXIT OPERATOR   'Ignore self-assignto:
Code: [Select]
PRIVATE OPERATOR DWSTR.Let (BYREF pwszStr AS WSTRING PTR)
  IF m_pBuffer = cast(ubyte ptr, pwszStr) THEN EXIT OPERATOR   'Ignore self-assign

  this.Clear
  IF pwszStr = 0 THEN EXIT OPERATOR
  this.Add(*pwszStr)
END OPERATOR

it should work again. Could you please test?


JK
« Last Edit: June 03, 2019, 11:01:47 PM by Juergen Kuehlwein »