[Firebird-devel] Firebird Transaction ID limit solution

Discussion:

Yi Lu

2011-12-24 00:37:08 UTC

In the past week, we have a few databases from several customers run out of
the 2^31-1 transaction ID limit. After some research I found this problem to
have been around since five years ago. Some suggestions were given such as
this thread,
http://firebird.1100200.n4.nabble.com/discussion-about-real-fix-for-CORE-2348-td3478013.html
but they requires off-time.

While we've made our efforts to reduce number of transactions from
application level, we still need a thorough solution.

1.I wonder if anyone now has a proper solution to this issue, without
requireing shutting down the system.

2. If we have no option other than modifying Firebird source code, can
anyone shed some light on how complicated this change will be and how wide
the change would affect? Any advice on where to make the code change will be
appreciated.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4230210.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Dimitry Sibiryakov

2011-12-24 09:29:31 UTC

Permalink

24.12.2011 1:37, Yi Lu wrote:
> 1.I wonder if anyone now has a proper solution to this issue, without
> requireing shutting down the system.

Set up cluster. While one node is down for maintenance, users can work with other(s).

--
SY, SD.

Leyne, Sean

2011-12-25 00:00:31 UTC

Permalink

Dimitry,

A cluster won't help, since all databases would see the same number of transactions, the number of which is the problem.

Sean

________________________________________
From: Dimitry Sibiryakov [***@ibphoenix.com]
Sent: Saturday, December 24, 2011 4:29 AM
To: For discussion among Firebird Developers
Subject: Re: [Firebird-devel] Firebird Transaction ID limit solution - Email found in subject

24.12.2011 1:37, Yi Lu wrote:
> 1.I wonder if anyone now has a proper solution to this issue, without
> requireing shutting down the system.

Set up cluster. While one node is down for maintenance, users can work with other(s).

--
SY, SD.

Dimitry Sibiryakov

2011-12-25 09:39:35 UTC

Permalink

25.12.2011 1:00, Leyne, Sean wrote:
> A cluster won't help, since all databases would see the same number of transactions, the number of which is the problem.

No, if several transactions during transfer between nodes are merged into one.
Besides, there is no need for all nodes to start from the same counter value.

--
SY, SD.

Dmitry Kuzmenko

2011-12-24 12:58:12 UTC

Permalink

Hello, Yi!

Saturday, December 24, 2011, 4:37:08 AM, you wrote:

YL> In the past week, we have a few databases from several customers run out of
YL> the 2^31-1 transaction ID limit. After some research I found this problem to
YL> have been around since five years ago. Some suggestions were given such as
YL> this thread,
YL> http://firebird.1100200.n4.nabble.com/discussion-about-real-fix-for-CORE-2348-td3478013.html
YL> but they requires off-time.

YL> While we've made our efforts to reduce number of transactions from
YL> application level, we still need a thorough solution.

How long database "live" until it reach transaction limit?
How many users work with databases? How long during the day they
work with database?
How many transactions you have each day?
Can you give gstat -h information ?

YL> 1.I wonder if anyone now has a proper solution to this issue, without
YL> requireing shutting down the system.

you can't reset transaction numbers without doing backup/restore.

YL> 2. If we have no option other than modifying Firebird source code, can
YL> anyone shed some light on how complicated this change will be and how wide
YL> the change would affect? Any advice on where to make the code change will be
YL> appreciated.

What Firebird version do you use? Extending transaction numbers from
32 to 64 bit is not only change of the sources, but also change of
ODS.
So, this will not be done in any existing versions, including 2.5.

Extending transaction numbers can be done in Firebird 3.0, but
I haven't found this in it's latest roadmap.

--
Dmitry Kuzmenko, www.ibase.ru, (495) 953-13-34

Yi Lu

2011-12-28 15:37:58 UTC

Permalink

One of our busy sites has more than 120 TPS and it cannot last longer than 6
month. The database is online 24-7 and number of simultaneous users ranging
from 0 to 50. Currently, its transaction number is at 2,089,589,687

By calculation, an average of 39 transactions per second (TPS) corresponds
to 2 year of life. 39 TPS is not unrealistically intensive in modern age.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4240191.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Yi Lu

2011-12-28 16:34:32 UTC

Permalink

It is feasible to roll over the transaction ID without putting the database
offline? i.e. when ID is close to limit, reset it to 0 from the code?

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4240359.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Dimitry Sibiryakov

2011-12-28 16:44:47 UTC

Permalink

28.12.2011 17:34, Yi Lu wrote:
> It is feasible to roll over the transaction ID without putting the database
> offline? i.e. when ID is close to limit, reset it to 0 from the code?

It will result in comple data loss. "DROP DATABASE" has the same result, but easier to do.

--
SY, SD.

Pierre Y.

2011-12-28 16:48:34 UTC

Permalink

On Wed, Dec 28, 2011 at 5:34 PM, Yi Lu <***@ramsoft.biz> wrote:
> It is feasible to roll over the transaction ID without putting the database
> offline? i.e. when ID is close to limit, reset it to 0 from the code?

Seems that PostgreSQL AutoVacuum daemon do that for Postgresql databases :

http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html

"PostgreSQL's VACUUM command has to run on a regular basis for several reasons:
...
To protect against loss of very old data due to transaction ID wraparound"

Leyne, Sean

2011-12-27 21:17:00 UTC

Permalink

Dimitry,

> 27.12.2011 19:17, Leyne, Sean wrote:
> > That type of solution is not what I would define as a cluster.
>
> As you wish. But the rest of world consider this kind of system to be called
> "a shared-nothing cluster".

You are correct! "a shared-nothing cluster" is a type of cluster, one that is used for a number of database cluster solutions.

I should have said:

"That type of solution is not what immediately comes to mind for me, since I see a shared disk solution (using redundant SAN storage) to be much easier to implement for FB."

Sean

Dimitry Sibiryakov

2011-12-27 21:31:05 UTC

Permalink

27.12.2011 22:17, Leyne, Sean wrote:
> I should have said:
>
> "That type of solution is not what immediately comes to mind for me, since I see a shared disk solution (using redundant SAN storage) to be much easier to implement for FB."

Unfortunately, shared-storage cluster doesn't solve transaction limit problem.
BTW, distributed lock manager also isn't a trivial thing. And is quite slow.

--
SY, SD.

Leyne, Sean

2011-12-28 18:51:06 UTC

Permalink

> It is feasible to roll over the transaction ID without putting the database
> offline? i.e. when ID is close to limit, reset it to 0 from the code?

In theory there is a "simple" code change which would provide you another 6 months breathing room.

But there is no permanent solution which is currently available, and none that will be available in 6 months. You will need to perform a backup/restore at some point.

The "simple" solution is to change the datatype of Transaction related variables from SLONG* to ULONG. The reality of this solution is, unfortunately, far uglier given the testing required to confirm that all references have been changed.

Sean

* for the life of me I don't understand why signed types where used for variables which could only contain positive values -- this is a common problem which is throughout the codebase.

Alex Peshkoff

2011-12-29 08:50:00 UTC

Permalink

On 12/28/11 22:51, Leyne, Sean wrote:
>> It is feasible to roll over the transaction ID without putting the database
>> offline? i.e. when ID is close to limit, reset it to 0 from the code?
> In theory there is a "simple" code change which would provide you another 6 months breathing room.
>
> But there is no permanent solution which is currently available, and none that will be available in 6 months. You will need to perform a backup/restore at some point.
>
> The "simple" solution is to change the datatype of Transaction related variables from SLONG* to ULONG. The reality of this solution is, unfortunately, far uglier given the testing required to confirm that all references have been changed.
>
>
> Sean
>
>
> * for the life of me I don't understand why signed types where used for variables which could only contain positive values -- this is a common problem which is throughout the codebase.

This change was done in trunk. The reason for signed type here was (at
least visible reason) very simple - into some functions using same
parameter might be passed transaction number (positive value) and
something else (negative value). I.e. negative sign meant 'this is not
transaction' and function behaved according to it.

Please do not treat it as an advice to use trunk in production!!!

Leyne, Sean

2011-12-29 18:05:26 UTC

Permalink

Alex,

> > The "simple" solution is to change the datatype of Transaction related
> variables from SLONG* to ULONG. The reality of this solution is,
> unfortunately, far uglier given the testing required to confirm that all
> references have been changed.
> >
> >
> > Sean
> >
> >
> > * for the life of me I don't understand why signed types where used for
> variables which could only contain positive values -- this is a common
> problem which is throughout the codebase.
>
> This change was done in trunk.

To be clear, you have a code-branch that uses ULONG for transaction ID?

> The reason for signed type here was (at least visible reason) very simple
> - into some functions using same parameter might
> be passed transaction number (positive value) and something else (negative
> value). I.e. negative sign meant 'this is not transaction' and function behaved
> according to it.

And some people have complained about some of my suggestions as being "hacks"!!!

Sean

Ann Harrison

2011-12-29 22:57:02 UTC

Permalink

Sean,

> Alex wrote:

>> - into some functions using same parameter might
>> be passed transaction number (positive value) and something else (negative
>> value). I.e. negative sign meant 'this is not transaction' and function behaved
>> according to it.
>
> And some people have complained about some of my suggestions as being "hacks"!!!
>

Actually, there is at least one other "special" value for transaction
ids. Zero is always
the system transaction. If you consider that a signed long is
"retirement proof", which
we did, using the other half for something else doesn't seem so bad.
Particularly if
you cut your programming teeth in a 64Kb address space.

Maybe it's time to look at all the small integers as well.

Cheers,

Ann

Dmitry Yemanov

2011-12-30 06:33:41 UTC

Permalink

29.12.2011 22:05, Leyne, Sean wrote:
>
>> This change was done in trunk.
>
> To be clear, you have a code-branch that uses ULONG for transaction ID?

Trunk is the ongoing development branch (formerly known as HEAD in CVS),
i.e. transaction IDs are already unsigned long in FB 3.0.

Dmitry

Leyne, Sean

2011-12-30 20:43:37 UTC

Permalink

Dmitry,

> Trunk is the ongoing development branch (formerly known as HEAD in CVS),
> i.e. transaction IDs are already unsigned long in FB 3.0.

Although this would mean a further ODS change as well as an increase in overhead associated with all rows, perhaps the ODS size should be increased from 4 bytes to 5 bytes to remove any possible likelihood of overflowing the max value (=256 transactions per sec, continuously for over 136 years!)

Sean

Kjell Rilbe

2011-12-30 22:10:21 UTC

Permalink

Den 2011-12-30 21:43 skrev Leyne, Sean såhär:
> Dmitry,
>
>> Trunk is the ongoing development branch (formerly known as HEAD in CVS),
>> i.e. transaction IDs are already unsigned long in FB 3.0.
> Although this would mean a further ODS change as well as an increase in overhead associated with all rows, perhaps the ODS size should be increased from 4 bytes to 5 bytes to remove any possible likelihood of overflowing the max value (=256 transactions per sec, continuously for over 136 years!)
Who knows what will happen within only 5-10 years? Perhaps in five
years, it will be common with systems running a few thousand
transactions per second? In that case 40 bits will only suffice for
about 11 years (3000 trans/sec).

If such a "big" change is to be made I suggest to make it at least 48
bits, and why not 64 bits while we're at it? At least I was under the
impression that the snag is not the size and space it takes on disk, but
rather that the trans id type is used in so many places that the change
is high risk. So, if it's to be changed at all, make sure the change is
large enough for a long time.

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Alexander Peshkov

2011-12-31 13:57:21 UTC

Permalink

В Пт., 30/12/2011 в 23:10 +0100, Kjell Rilbe пишет:
> Den 2011-12-30 21:43 skrev Leyne, Sean såhär:
> > Dmitry,
> >
> >> Trunk is the ongoing development branch (formerly known as HEAD in CVS),
> >> i.e. transaction IDs are already unsigned long in FB 3.0.
> > Although this would mean a further ODS change as well as an increase in overhead associated with all rows, perhaps the ODS size should be increased from 4 bytes to 5 bytes to remove any possible likelihood of overflowing the max value (=256 transactions per sec, continuously for over 136 years!)
> Who knows what will happen within only 5-10 years? Perhaps in five
> years, it will be common with systems running a few thousand
> transactions per second? In that case 40 bits will only suffice for
> about 11 years (3000 trans/sec).
>
> If such a "big" change is to be made I suggest to make it at least 48
> bits, and why not 64 bits while we're at it?

This will make each version of the record (not the record- but EACH
version of it) 4 bytes longer. For table with small records that means
severe performance degradation.

Dmitry Yemanov

2011-12-31 14:08:59 UTC

Permalink

31.12.2011 17:57, Alexander Peshkov wrote:

> This will make each version of the record (not the record- but EACH
> version of it) 4 bytes longer.

Not strictly necessary. We could use a variable-length encoding for txn
ids longer than 32 bits and mark such records with a new flag. It would
add zero storage/performance overhead for all the current applications
but allow longer txn ids for the slightly bigger cost. It would increase
the code complexity though.

Dmitry

Alexander Peshkov

2011-12-31 14:15:56 UTC

Permalink

В Сб., 31/12/2011 в 18:08 +0400, Dmitry Yemanov пишет:
> 31.12.2011 17:57, Alexander Peshkov wrote:
>
> > This will make each version of the record (not the record- but EACH
> > version of it) 4 bytes longer.
>
> Not strictly necessary. We could use a variable-length encoding for txn
> ids longer than 32 bits and mark such records with a new flag. It would
> add zero storage/performance overhead for all the current applications
> but allow longer txn ids for the slightly bigger cost. It would increase
> the code complexity though.
>

May be simply use 64-bit numbers with new flag? Variable-length encoding
is not very good from performance POV.

Kjell Rilbe

2012-01-01 10:34:00 UTC

Permalink

Den 2011-12-31 15:08 skrev Dmitry Yemanov såhär:
> 31.12.2011 17:57, Alexander Peshkov wrote:
>
>> This will make each version of the record (not the record- but EACH
>> version of it) 4 bytes longer.
> Not strictly necessary. We could use a variable-length encoding for txn
> ids longer than 32 bits and mark such records with a new flag. It would
> add zero storage/performance overhead for all the current applications
> but allow longer txn ids for the slightly bigger cost. It would increase
> the code complexity though.

Are there unused flag bits available in the current record format, that
could be used for this purpose? Or how are such flag bits encoded?

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Dmitry Yemanov

2012-01-01 11:50:09 UTC

Permalink

01.01.2012 14:34, Kjell Rilbe wrote:

> Are there unused flag bits available in the current record format, that
> could be used for this purpose? Or how are such flag bits encoded?

We have 7 or 8 bits (out of 16) currently available.

Dmitry

Jesus Garcia

2012-01-01 12:48:43 UTC

Permalink

>
> We have 7 or 8 bits (out of 16) currently available.
>
>
> Dmitry

Is not better use 64 bits integer?. In the near future computers speed, hdd speeds, etc. will increase, and transaction id of 64 bits overhead will affect less and less to performance.

Jesus

Dmitry Yemanov

2012-01-01 13:06:01 UTC

Permalink

01.01.2012 16:48, Jesus Garcia wrote:
>
> Is not better use 64 bits integer?. In the near future computers speed, hdd speeds, etc. will increase, and transaction id of 64 bits overhead will affect less and less to performance.

Disk speed was always the issue, it doesn't increase that fast as
CPU/memory speed. So the on-disk size always matters quite a lot.

Dmitry

Dimitry Sibiryakov

2012-01-01 13:09:24 UTC

Permalink

01.01.2012 14:06, Dmitry Yemanov wrote:
> Disk speed was always the issue, it doesn't increase that fast as
> CPU/memory speed. So the on-disk size always matters quite a lot.

CPU speed also has stopped growing.

--
SY, SD.

Jesús García

2012-01-01 16:31:02 UTC

Permalink

>
> CPU speed also has stopped growing.
>

You are right, speed has stopped but performance has increased.

Jesus

Jesús García

2012-01-01 17:11:34 UTC

Permalink

>
> Disk speed was always the issue, it doesn't increase that fast as
> CPU/memory speed. So the on-disk size always matters quite a lot.
>
>
> Dmitry

I think the question is if it is neccesary and good for Firebird, and if the pros of having transactions id of 64bits are better that the cons of not having it. From my POV using 32 bits id is an important restriction in the engine, and i would prefer not having it.

The flag option is good because only the databases that need high transactions numbers pay for their use, but as you wrote, the engine code become more complicated.

My customer with highest transactions number would work, by the moment, for 7/8 years without stop, but i would sleep better if i have not to worry about that, and if i have not have to explain that restriction to my customer, what can make think that firebird is not so good as i tell him.

Firebird 3 is growing, and now may be the moment to change it. FB 3 will live for years and Systems require more and more. I used Firebird 1 years ago, but now our clients with more work load wouldn't work with that version.

Is there any measurement of the real impact of the change?

Jesus

Adriano dos Santos Fernandes

2012-01-01 17:22:40 UTC

Permalink

On 01-01-2012 15:11, Jesús García wrote:
>
> I think the question is if it is neccesary and good for Firebird, and if the pros of having transactions id of 64bits are better that the cons of not having it.

Dimitry Sibiryakov

2012-01-01 18:32:17 UTC

Permalink

01.01.2012 18:22, Adriano dos Santos Fernandes wrote:
> Let something like sweep consolidate old transactions in only one,
> concurrently with user operations.

Sweep is mostly read-only operation which is known to cause intolerable slowdown.
Everybody turn it off because of that. You are suggesting to run read-write operation on
whole database concurrently. I'm afraid that comparing with this, backup-restore is a
"lesser evil". At least - faster one.

--
SY, SD.

Yi Lu

2012-01-02 19:34:47 UTC

Permalink

We turn off the automatical sweep option, and manually perform sweep (as well
as recompute index selectivity) once a day at low hour (2am). The
performance is actually not impacted so much.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4254296.html
Sent from the firebird-devel mailing list archive at Nabble.com.

james

2012-01-01 19:34:47 UTC

Permalink

>Nobody needs more than 4 billions active transactions concurrently.

But that's not what you'll get, because you can complete transactions
and get a hol, as it were. Can a process grab a transaction id and hold
onto it 'for ever' without messing up the system in other ways that
preventing the reclaim of ids?

Have to say, I can't see a good reason to use 32 bit ids. While
physical disk seek speeds are not increasing, many disks can go rather
fast and so long as you don't seek IO of larger amounts of data isn't
too costly. I guess the biggest effect is to reduce the effective cache
size in caching disk ontrollers. And 'disks' certainly are getting
much, much faster as we see SSD takeup.

Kjell Rilbe

2012-01-01 20:31:45 UTC

Permalink

So far, these are the suggestions I've seen (I'm adding my own
suggestion at the bottom):

1. Implement 64 bit id:s and use them everywhere always.
Pros:
- Straightforward once all code has been changed to use 64 bit id:s.
- No overhead for various special flags and stuff.
Cons:
- Will increase disk space even for DB:s that don't need 64 bit id:s.

2. Implement 64 bit id:s but use them only in records where required.
use a per-record flag to indicate if the transaction id i 32 bit or 64 bit.
Pros:
- Will not increase disk space for DB:s that don't need 64 bit id:s.
- For DB:s that do need 64 bit id:s, only records with high transaction
id:s will take up the additional 4 bytes of space.
Cons:
- More complicated implementation than suggestion 1.
- Runtime overhead for flag checking on each record access.

3. Stay at 32 bit id:s but somehow "compact" all id:s older than OIT(?)
into OIT-1. Allow id:s to wrap around.
Pros:
- No extra disk space.
- Probably rather simple code changes for id usage.
Cons:
- May be difficult to find an effective way to compact old id:s.
- Even if an effective way to compact old id:s is found, it may be
complicated to implement.
- May be difficult or impossible to perform compacting without large
amounts of write locks.
- Potential for problems if OIT isn't incremented (but such OIT problems
should be solved anyway).

May I suggest a fourth option?

4. Add a DB wide flag, like page size, indicating if this DB uses 32 bit
or 64 bit id:s. Could be changed via backup restore cycle.
Pros:
- Almost as simple as suggestion 1 once all code has been changed to
support 64 bit id:s.
- Minimal overhad for flag checking.
- DB:s that don't need 64 bit id:s won't need to waste disk space on 64
bit id:s.
Cons:
- A DB that has been configured with the incorrect flag has to be backed
up + restored to change the flag, causing (a one-time) down time.
- A bit more complicated to implement and maintain than suggestion 1.

Questions:

Q1. Will it be extremely difficult to change all code to support 64 bit
id:s? So difficult that option 3 is worth investigating thoroughly? As
far as I can see, that's the only option to avoid 64 bit id:s.

Q2. If support for 64 bit id:s is implemented, how important is it to
conserve disk space in cases where 64 bit id:s are not required? And is
it important to conserve disk space per record (suggestion 2) or is it
sufficient to conserve disk space only for databases where 32 bit id:s
suffice (suggestion 4)?

Q3. For suggestion 4: the flag has to be "loaded" only on connect, but
in the 32 bit case I would assume each record access would cast the
loaded 32 bit id to 64 bit. Would the overhead for that logic and cast
be noticable and large enough to be problem? I assume the overhead would
be less that the per record flag as per suggestion 2, but... maybe not?
If suggestion 2 logic is comparable in code complexity and runtime
overhead to that of suggestion 4, then I see no reason to go for
suggestion 4.

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Dimitry Sibiryakov

2012-01-01 22:25:42 UTC

Permalink

01.01.2012 21:31, Kjell Rilbe wrote:
> So far, these are the suggestions I've seen

Hey, how about my suggestion leave everything as is and perform backup-restore cycle on
demand?..

Pros:
- No overhead.
- No additional disk space.
- No code change.
Cons:
- DBA with brain is required.

--
SY, SD.

Kjell Rilbe

2012-01-01 22:43:46 UTC

Permalink

Den 2012-01-01 23:25 skrev Dimitry Sibiryakov såhär:
> 01.01.2012 21:31, Kjell Rilbe wrote:
>> So far, these are the suggestions I've seen
> Hey, how about my suggestion leave everything as is and perform backup-restore cycle on
> demand?..
>
> Pros:
> - No overhead.
> - No additional disk space.
> - No code change.
> Cons:
> - DBA with brain is required.
Yes, of course! But you're forgetting this:
Con:
- If the no downtime is acceptable, it requires special measures to be
able to keep the system running during a backup-restore cycle, e.g. cluster.

Personally I don't know much about clustering or FB replication, so I
can't really say if this is a big problem or not. But from what I've
read in this thread it is not without problems.

And I think we can live without the sarcasm inherent in your con.

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Dimitry Sibiryakov

2012-01-02 10:09:39 UTC

Permalink

01.01.2012 23:43, Kjell Rilbe wrote:
> Den 2012-01-01 23:25 skrev Dimitry Sibiryakov såhär:
>> 01.01.2012 21:31, Kjell Rilbe wrote:
>>> So far, these are the suggestions I've seen
>> Hey, how about my suggestion leave everything as is and perform backup-restore cycle on
>> demand?..
>>
>> Pros:
>> - No overhead.
>> - No additional disk space.
>> - No code change.
>> Cons:
>> - DBA with brain is required.
> Yes, of course! But you're forgetting this:
> Con:
> - If the no downtime is acceptable, it requires special measures to be
> able to keep the system running during a backup-restore cycle, e.g. cluster.

Single server without downtime is a myth anyway.
And 24/365 systems cannot live without DBA in any case.

--
SY, SD.

Jesus Garcia

2012-01-02 13:46:38 UTC

Permalink

>
> Single server without downtime is a myth anyway.
The problem is not downtime is how much downtime. Backup and restore is so much downtime.

Jesus

Dimitry Sibiryakov

2012-01-02 14:50:41 UTC

Permalink

02.01.2012 14:46, Jesus Garcia wrote:
> he problem is not downtime is how much downtime. Backup and restore is so much downtime.

Not more downtime that simple restore after complete server destruction by flood or
plane crash.

--
SY, SD.

Mark Rotteveel

2012-01-02 15:10:10 UTC

Permalink

On 2-1-2012 15:50, Dimitry Sibiryakov wrote:
> 02.01.2012 14:46, Jesus Garcia wrote:
>> he problem is not downtime is how much downtime. Backup and restore is so much downtime.
>
> Not more downtime that simple restore after complete server destruction by flood or
> plane crash.

Could you stop with the absurd comparisons? One is normal maintenance
and the other is (extreme) disaster recovery which are in no way comparable.

Mark

--
Mark Rotteveel

Dimitry Sibiryakov

2012-01-02 15:43:22 UTC

Permalink

02.01.2012 16:10, Mark Rotteveel wrote:
> Could you stop with the absurd comparisons? One is normal maintenance
> and the other is (extreme) disaster recovery which are in no way comparable.

But downtime is downtime. Customers don't care about its reason, do they?..

--
SY, SD.

Mark Rotteveel

2012-01-02 15:53:18 UTC

Permalink

On 2-1-2012 16:43, Dimitry Sibiryakov wrote:
> 02.01.2012 16:10, Mark Rotteveel wrote:
>> Could you stop with the absurd comparisons? One is normal maintenance
>> and the other is (extreme) disaster recovery which are in no way comparable.
>
> But downtime is downtime. Customers don't care about its reason, do they?..

I think in general, customers are more forgiving if the downtime is
caused by a disaster then if it is caused by long-running maintenance.

Mark
--
Mark Rotteveel

Lester Caine

2012-01-02 16:40:43 UTC

Permalink

Dimitry Sibiryakov wrote:
>> Could you stop with the absurd comparisons? One is normal maintenance
>> > and the other is (extreme) disaster recovery which are in no way comparable.
> But downtime is downtime. Customers don't care about its reason, do they?..

In my own cases, ANY downtime during office hours is unacceptable, so any
failure that brings the whole system down would result in penalties! Organised
downtime is possible on many sites, but some sites are running 24/7, So we
maintain data in a manor that the system will work with elements down, but the
database operation must be maintained. Firebird has been running reliably for
many years on these sites, when other services HAVE crashed, to the extent that
services have been moving onto our framework due simply to it's reliability :)

Customers care very much about downtime ... especially if it HAS to happen
simply for maintenance reasons.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

Dimitry Sibiryakov

2012-01-02 16:53:28 UTC

Permalink

02.01.2012 17:40, Lester Caine wrote:
> In my own cases, ANY downtime during office hours is unacceptable, so any
> failure that brings the whole system down would result in penalties! Organised
> downtime is possible on many sites, but some sites are running 24/7, So we
> maintain data in a manor that the system will work with elements down, but the
> database operation must be maintained. Firebird has been running reliably for
> many years on these sites, when other services HAVE crashed, to the extent that
> services have been moving onto our framework due simply to it's reliability:)
>
> Customers care very much about downtime ... especially if it HAS to happen
> simply for maintenance reasons.

But maintenance doesn't inevitable cause downtime. Maintenance of one piece of a system
can't stop whole system if other pieces will do all the job. This is the main idea behind
RAID1-6, for example. What's wrong with your system if it cannot work without only one
part of it?..

--
SY, SD.

PS: For me it is rather strange to talk about so basic principles of reliability here...

Lester Caine

2012-01-02 20:55:56 UTC

Permalink

Dimitry Sibiryakov wrote:
>> In my own cases, ANY downtime during office hours is unacceptable, so any
>> > failure that brings the whole system down would result in penalties! Organised
>> > downtime is possible on many sites, but some sites are running 24/7, So we
>> > maintain data in a manor that the system will work with elements down, but the
>> > database operation must be maintained. Firebird has been running reliably for
>> > many years on these sites, when other services HAVE crashed, to the extent that
>> > services have been moving onto our framework due simply to it's reliability:)
>> >
>> > Customers care very much about downtime ... especially if it HAS to happen
>> > simply for maintenance reasons.
> But maintenance doesn't inevitable cause downtime. Maintenance of one piece of a system
> can't stop whole system if other pieces will do all the job. This is the main idea behind
> RAID1-6, for example. What's wrong with your system if it cannot work without only one
> part of it?..

Backup and restore take a finite and growing amount of time with 10+ years worth
of data. It is rare to need to run a cycle, but when the need does arise then it
has to be handled. The point about loosing part of the system relates to things
like loosing a ticket printer or display device ... not critical since the users
can work around the losses, and as long as *A* copy of the database can be seen
by a web server, they can continue to work. I've even had RAID systems fail in
the past, so nowadays it's a lot more reliable to have simple duplicate machines
on the system each capable of providing the services needed.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

Lester Caine

2012-01-02 15:31:39 UTC

Permalink

Mark Rotteveel wrote:
>> 02.01.2012 14:46, Jesus Garcia wrote:
>>> >> he problem is not downtime is how much downtime. Backup and restore is so much downtime.
>> >
>> > Not more downtime that simple restore after complete server destruction by flood or
>> > plane crash.
> Could you stop with the absurd comparisons? One is normal maintenance
> and the other is (extreme) disaster recovery which are in no way comparable.

And if you have a decent recovery setup, then you simply switch to a remote
backup machine, which allows as much time as you like for repairs! Taking users
offline for a period of time is simply not practical in a lot of cases, so we
need to be able to manage how things are handled during any maintenance cycle.

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

Lester Caine

2012-01-01 22:47:56 UTC

Permalink

Dimitry Sibiryakov wrote:
> 01.01.2012 21:31, Kjell Rilbe wrote:
>> > So far, these are the suggestions I've seen
> Hey, how about my suggestion leave everything as is and perform backup-restore cycle on
> demand?..
>
> Pros:
> - No overhead.
> - No additional disk space.
> - No code change.
> Cons:
> - DBA with brain is required.
... Requires a downtime? Which can be substantial
Unless the configuration can be set up to allow updates applied following the
backup-restore cycle to be reapplied?

--
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk//
Firebird - http://www.firebirdsql.org/index.php

Kjell Rilbe

2012-01-02 05:14:17 UTC

Permalink

Den 2012-01-01 23:47 skrev Lester Caine såhär:
> Dimitry Sibiryakov wrote:
>> 01.01.2012 21:31, Kjell Rilbe wrote:
>>>> So far, these are the suggestions I've seen
>> Hey, how about my suggestion leave everything as is and perform backup-restore cycle on
>> demand?..
>>
>> Pros:
>> - No overhead.
>> - No additional disk space.
>> - No code change.
>> Cons:
>> - DBA with brain is required.
> ... Requires a downtime? Which can be substantial
> Unless the configuration can be set up to allow updates applied following the
> backup-restore cycle to be reapplied?

Interesting proposition in its own right. :-) I'm thinking that nbackup
does something along those lines, but on the page level. Would it be
very difficult to implement something similar but on the record level?

Avoiding downtime during backup/restore might be a more common
requirement than the 64 bit txn id issue.

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Yi Lu

2012-01-02 19:32:45 UTC

Permalink

Approach 1 seems to be least risky. Disk space should not be a big issue with
today's hardware.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4254287.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Ann Harrison

2012-01-02 19:49:52 UTC

Permalink

On Mon, Jan 2, 2012 at 2:32 PM, Yi Lu <***@ramsoft.biz> wrote:
> Approach 1 seems to be least risky. Disk space should not be a big issue with
> today's hardware.
>

The problem is not disk space, but locality of reference. With small
records, adding
four bytes could decrease the number of rows per page by 10-15%, leading to more
disk I/O. That's a significant cost to every database which can be
avoided by using
a slightly more complicated variable length transaction id or a flag
that indicates which
size is used for a particular record. Firebird already checks the
flags to determine
whether a record is fragmented, so the extra check adds negligible overhead

And, as an aside, sweeping is not read-only now. It's purpose is to
remove unneeded
old versions of records from the database. The actual work may be
done by the garbage
collect thread, but the I/O is there.

Good luck,

Ann

Yi Lu

2012-01-03 01:02:03 UTC

Permalink

We are trying to take the approach #1, involving ODS change and data type
changes for transaction id to use signed 64 bit integer. We just start
looking at the code from today and following changes should be applied on
code changes so far,
1. hdr_oldest_transaction, hdr)oldest_active, hdr_next_transaction on
header_page structure should be SINT64 on file ods.h.
2. fld_trans_id should have dtype_int64 and size should be sizeof(SINT64) on
fields.h.
3. Signature of functions and variables which are using those variables
should be changed.

I am wondering if we are on right track on approach? Did we miss any
important DS changes?

Also, we would be willing to contribute to the Firebird foundation in order
to help us to develop this feature as we can't wait until Firebird 3.0.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4255165.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Ann Harrison

2012-01-03 03:53:28 UTC

Permalink

>
> 1. hdr_oldest_transaction, hdr)oldest_active, hdr_next_transaction on
> header_page structure should be SINT64 on file ods.h.
> 2. fld_trans_id should have dtype_int64 and size should be sizeof(SINT64) on

Why signed?

Ann
>

Yi Lu

2012-01-03 15:09:17 UTC

Permalink

We have read that negative transaction IDs are used to indicate an invalid
transaction. Thus, we have decided to continue with the signed approach to
prevent logic problems.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4257208.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Philippe Makowski

2012-01-03 06:54:29 UTC

Permalink

Yi Lu [2012-01-03 02:02] :
> Also, we would be willing to contribute to the Firebird foundation in order
> to help us to develop this feature as we can't wait until Firebird 3.0.

Please contact me directly, to see what are your need and willing so we
can find the best solution

--
Philippe Makowski

Kjell Rilbe

2012-01-03 20:04:11 UTC

Permalink

Den 2012-01-01 21:31 skrev Kjell Rilbe såhär:
> 3. Stay at 32 bit id:s but somehow "compact" all id:s older than OIT(?)
> into OIT-1. Allow id:s to wrap around.
> Pros:
> - No extra disk space.
> - Probably rather simple code changes for id usage.
> Cons:
> - May be difficult to find an effective way to compact old id:s.
> - Even if an effective way to compact old id:s is found, it may be
> complicated to implement.
> - May be difficult or impossible to perform compacting without large
> amounts of write locks.
> - Potential for problems if OIT isn't incremented (but such OIT problems
> should be solved anyway).
I'm thinking about this solution, in case a solution is actually needed
(see recent subthread).

I assume the sweep only looks at record versions that are deleted, and
"mark them" (?) for garbage collection if they have transaction id less
than OIT. Correct?

This is not sufficient for the "consolidation" of old transaction id:s.

What's needed is, in principle, a task that reads through ALL record
versions, and for each one with transaction id < OIT, change it to OIT -
1. When it has done that for the entire database, it can move the max
useable transaction id to OIT - 2.

Then it can wait until the database starts to exhaust the "transaction
id space" again before repeating the cycle.

To make this a bit less work intensive, would it be possible and a good
idea to mark each database page with the lowest transaction id in use on
that page? In that case, the task could skip all pages where this value
is >= OIT - 1. But would it require a lot of overhead to keep this info
up to date? I don't know how a page is used to access a record on it....
But doesn't cooperative garbage collection mean that on each page
access, all deleted records versions on that page are marked for garbage
collection? In that case I assume it will read the transaction id and
deletion state of all record versions on the page anyway, and that's all
that's needed to keep the page's lowest transaction id up to date. Or am
I missing something (likely...)?

I assume the lowest transaction id on a page can never become lower, and
a new page will always have a "near-the-tip" lowest transaction id. So
the consolidation task would not have to re-check a page that is updated
after the task checks it but before the task completes the cycle.

But how to make sure it checks all pages? Is there any well-defined
order in which it could check all pages, without running the risk of
missing some of them, even if the database is "live"?

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Dimitry Sibiryakov

2012-01-03 20:11:09 UTC

Permalink

03.01.2012 21:04, Kjell Rilbe wrote:
> What's needed is, in principle, a task that reads through ALL record
> versions, and for each one with transaction id< OIT, change it to OIT -
> 1. When it has done that for the entire database, it can move the max
> useable transaction id to OIT - 2.

It means to fetch/read every page with exclusive lock, modify it, mark it as a dirty
and write it back to the disk (when?). Crazy i/o load and long lock waits are guaranteed.

--
SY, SD.

Kjell Rilbe

2012-01-03 20:39:38 UTC

Permalink

Den 2012-01-03 21:11 skrev Dimitry Sibiryakov såhär:
> 03.01.2012 21:04, Kjell Rilbe wrote:
>> What's needed is, in principle, a task that reads through ALL record
>> versions, and for each one with transaction id< OIT, change it to OIT -
>> 1. When it has done that for the entire database, it can move the max
>> useable transaction id to OIT - 2.
> It means to fetch/read every page with exclusive lock, modify it, mark it as a dirty
> and write it back to the disk (when?). Crazy i/o load and long lock waits are guaranteed.
>
Yes, but perhaps it's more tolerable to have a somewhat slower system
for two days than to have 1 day of downtime? Assuming the
cluster/replication solution is not used...

And I would assume that it would only need to lock a single page at a
time, and that it would take very short time to do the job on that
single page. So, while it would incur a lot of write locks on a lot of
pages, there will only be a single page lock at a time, and the lock
will be very short lived. So, no long waits, but a whole lot of very
short waits.

Still, I am not really arguing that this is the best solution. Seems to
me you're right Dimitry, that a replication/cluster solution is better.
But could that perhaps be made a bit easier?

I consider myself to be pretty good at SQL and at high level system
development, but I've never setup a cluster, nor FB replication. So, for
me this is a real stumbling block. On top of all other things I need to
do yesterday, I would also have to learn how to set this up. I'm sure
I'm not the only one...

So, what could be done with FB to make the process easier of backing up
and restoring a live database and the syncing the new copy with all
updates since the start of backup, and dfinally bringing the copy live
instead of the old one?

Perhaps this is what would be "best spent devel resources" to solve the
issue?

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Dmitry Yemanov

2012-01-03 20:14:52 UTC

Permalink

04.01.2012 0:04, Kjell Rilbe wrote:

> To make this a bit less work intensive, would it be possible and a good
> idea to mark each database page with the lowest transaction id in use on
> that page? In that case, the task could skip all pages where this value
> is>= OIT - 1.

It could avoid page writes but not page reads. The latter could also be
possible to achieve but the solution is going to be even more complicated.

Dmitry

Yi Lu

2012-01-04 00:08:44 UTC

Permalink

How do I debug if firebird server running with code changes that we made? I
am running fbserver.exe with "-a -p 3050 -database
"localhost:..\myDatabase.fdb"". Firebird tray icon is showing me that
nothing has been attached and nothing is happening outside of the windows
message loop. Also, when parsing the debug parameter, the parser did not
pick up the database parameter.

Basically, I am wondering if I can find a document describing how to debug,
reference guide or any kind of helpful document to debug Firebird code
changes. We are using Visual Studio 2005.

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4259244.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Vlad Khorsun

2012-01-04 18:45:48 UTC

Permalink

> How do I debug if firebird server running with code changes that we made?

Are you asking how to debug applications ? Using debugger ;)

> I am running fbserver.exe with "-a -p 3050 -database
> "localhost:..\myDatabase.fdb"". Firebird tray icon is showing me that
> nothing has been attached and nothing is happening outside of the windows
> message loop. Also, when parsing the debug parameter, the parser did not
> pick up the database parameter.

Are you changed command-line parser to support "database" switch ?
I don't ask why do you need such switch...

> Basically, I am wondering if I can find a document describing how to debug,
> reference guide or any kind of helpful document to debug Firebird code
> changes. We are using Visual Studio 2005.

So, use VS docs. Firebird is the same application as any other and there is
no separate description of how to debug Firebird.

Regards,
Vlad

PS Or i misunderstand you completely ;)

W O

2012-01-01 00:07:39 UTC

Permalink

You are right Alexander but with computers each month more and more fast,
that's really a problem today?

Greetings.

Walter.

On Sat, Dec 31, 2011 at 9:57 AM, Alexander Peshkov <***@mail.ru> wrote:

> Ð ÐÑ., 30/12/2011 Ð² 23:10 +0100, Kjell Rilbe Ð¿ÐžÑÐµÑ:
> > Den 2011-12-30 21:43 skrev Leyne, Sean sÃ¥hÃ€r:
> > > Dmitry,
> > >
> > >> Trunk is the ongoing development branch (formerly known as HEAD in
> CVS),
> > >> i.e. transaction IDs are already unsigned long in FB 3.0.
> > > Although this would mean a further ODS change as well as an increase
> in overhead associated with all rows, perhaps the ODS size should be
> increased from 4 bytes to 5 bytes to remove any possible likelihood of
> overflowing the max value (=256 transactions per sec, continuously for over
> 136 years!)
> > Who knows what will happen within only 5-10 years? Perhaps in five
> > years, it will be common with systems running a few thousand
> > transactions per second? In that case 40 bits will only suffice for
> > about 11 years (3000 trans/sec).
> >
> > If such a "big" change is to be made I suggest to make it at least 48
> > bits, and why not 64 bits while we're at it?
>
> This will make each version of the record (not the record- but EACH
> version of it) 4 bytes longer. For table with small records that means
> severe performance degradation.
>
>
>
>
> ------------------------------------------------------------------------------
> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
> infrastructure or vast IT resources to deliver seamless, secure access to
> virtual desktops. With this all-in-one solution, easily deploy virtual
> desktops for less than the cost of PCs and save 60% on VDI infrastructure
> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
> Firebird-Devel mailing list, web interface at
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
>

Alex Peshkoff

2012-01-03 13:13:32 UTC

Permalink

On 01/01/12 04:07, W O wrote:
> You are right Alexander but with computers each month more and more fast,
> that's really a problem today?

Practice says that it is still a problem, at least at user's brain level :)
We increased record number size in FB2, and I've used to hear many times
- well, why on the same server fb2 is slower than fb1.5 on some operations?

Alex Peshkoff

2011-12-30 07:47:45 UTC

Permalink

On 12/29/11 22:05, Leyne, Sean wrote:

>> The reason for signed type here was (at least visible reason) very simple
>> - into some functions using same parameter might
>> be passed transaction number (positive value) and something else (negative
>> value). I.e. negative sign meant 'this is not transaction' and function behaved
>> according to it.
> And some people have complained about some of my suggestions as being "hacks"!!!
>

Sean, certainly it was hack, but it left in codebase since pre-firebird
times. In fb3 it was cleaned up. We try to remove such 'solutions' from
the code, and certainly do not want to add new ones.

Jesus Garcia

2011-12-29 19:41:07 UTC

Permalink

> This change was done in trunk. The reason for signed type here was (at
> least visible reason) very simple - into some functions using same
> parameter might be passed transaction number (positive value) and
> something else (negative value). I.e. negative sign meant 'this is not
> transaction' and function behaved according to it.
>
Would not be better, instead of that, If transaction id is equal To 0, no transaction, else transaction.

As now there is a problem with transactionid and heavy loaded systems, that could solve in a little the problem.

Jesus

Dimitry Sibiryakov

2011-12-29 20:49:07 UTC

Permalink

29.12.2011 20:41, Jesus Garcia wrote:
> Would not be better, instead of that, If transaction id is equal To 0, no transaction, else transaction.

There is transaction number zero.

> As now there is a problem with transactionid and heavy loaded systems, that could solve in a little the problem.

Heavy loaded systems should use clusters. That's all. While one node is on maintenance,
others do all work. These systems need clusters anyway for high availability and/or load
balancing. One can't be serious running critical systems on a single server.

--
SY, SD.

Jesus Garcia

2011-12-29 21:09:40 UTC

Permalink

> Heavy loaded systems should use clusters. That's all. While one node is on maintenance,
> others do all work. These systems need clusters anyway for high availability and/or load
> balancing. One can't be serious running critical systems on a single server.
>
I don't agree with you. You generalize, and is not allways good.

From your POV what is not serious is having an "enterprise database" with that serious and important restriction.

I have not readed until your posts that for heavy loaded systems is necessary one cluster of firebird, and what is worse, who don't use them is not serious.

Replication is not easy with firebird, and is not native in the engine. There is so many problems with replication depending on the features of firebird used. Is different use firebird for store data in tables than programming business logic in database and use user restrictions, triggers, etc.

I do not use any other rdbms, but is there the same problem with postgre, Oracle, sqlserver, informix, db2, etc.?

Don't misinterpret my words, i love firebird.

Jesus

Leyne, Sean

2011-12-29 21:49:10 UTC

Permalink

> There is transaction number zero.

Actually, all databases are initialized with transaction #1 as the starting value, so there should not be any transaction 0.

Sean

Leyne, Sean

2012-01-03 17:51:30 UTC

Permalink

Jesus

> > Single server without downtime is a myth anyway.
> The problem is not downtime is how much downtime. Backup and restore is
> so much downtime.

If that is the case, how much downtime is acceptable?

There are a couple of possible solutions which would reduce the downtime;
- a new backup/restore tool which would use multiple readers/writers to minimize execution time,
- a "data port" utility which would allow for data to be ported from a live database to a new database while live is active but would need a finalization step where the live database is shutdown to apply the final data changes and add FK constraints.

There are, however, certain "realities" which cannot be overcome; disk throughput/IO performance.

Sean

Dimitry Sibiryakov

2012-01-03 18:26:10 UTC

Permalink

03.01.2012 18:51, Leyne, Sean wrote:
> - a "data port" utility which would allow for data to be ported from a live database to a new database while live is active but would need a finalization step where the live database is shutdown to apply the final data changes and add FK constraints.

And exactly this utility is called "replicator". If made right, it doesn't need FK
deactivation and can do "finalization step" when new database is already in use.
Aren't you tired inventing a wheel?..

--
SY, SD.

Ann Harrison

2012-01-03 18:44:45 UTC

Permalink

Dimitry,

>> - a "data port" utility which would allow for data to be ported from a live database to a new database while live is active but would need a finalization step where the live database is shutdown to apply the final data changes and add FK constraints.
>
> And exactly this utility is called "replicator". If made right, it doesn't need FK
> deactivation and can do "finalization step" when new database is already in use.
> Aren't you tired inventing a wheel?..

Different vehicles need different wheels. The wheels on my bicycle
wouldn't do at all for a cog-railway and cog-railway wheels work very
badly on airplanes. Airplane wheels are no use at all in a
grandfather clock. Engineering is all about creating new wheels.
Right now, what we're looking for is a wheel that can reset
transaction ids. I'm not sure that either replication or the
mechanism Sean is proposing (similar to either the start of a shadow
database or nbackup) can solve the overflowing transaction id problem.

Cheers,

Ann

Dimitry Sibiryakov

2012-01-03 19:08:24 UTC

Permalink

03.01.2012 19:44, Ann Harrison wrote:
> I'm not sure that either replication or the
> mechanism Sean is proposing (similar to either the start of a shadow
> database or nbackup) can solve the overflowing transaction id problem.

Simply: let's say we have two synchronous database. One is primary and the second is...
well... secondary.
When transaction count reach, say, 1000000000 transaction, we shut down replication and
perform backup-restore of the secondary database. Then we continue replication and after
some time we again have two synchronous databases. In primary database transaction counter
is big, in secondary - low.
Now we switch the roles. Primary database become a secondary and vice versa. After then
we can repeat previous step to reset transaction counter in ex-primary database without
stopping whole system.
Voila.

--
SY, SD.

Woody

2012-01-03 19:14:44 UTC

Permalink

From: "Ann Harrison" <***@nuodb.com>
> Dimitry,
>
>> And exactly this utility is called "replicator". If made right, it
>> doesn't need FK
>> deactivation and can do "finalization step" when new database is already
>> in use.
>> Aren't you tired inventing a wheel?..
>
> Different vehicles need different wheels. The wheels on my bicycle
> wouldn't do at all for a cog-railway and cog-railway wheels work very
> badly on airplanes. Airplane wheels are no use at all in a
> grandfather clock. Engineering is all about creating new wheels.
> Right now, what we're looking for is a wheel that can reset
> transaction ids. I'm not sure that either replication or the
> mechanism Sean is proposing (similar to either the start of a shadow
> database or nbackup) can solve the overflowing transaction id problem.
>

Maybe I'm a little dense, (probably :), but doesn't FB already know what the
oldest interesting transaction id is? Why couldn't transaction numbers be
allowed to wrap back around up to that point? As long as transactions are
committed at some point, the oldest transaction would move and it would
solve most problems being run into now, IMO.

I will accept any and all ridicule if this seems idiotic since I really
don't know the code and haven't even looked at it. I'm just amazed and
impressed at how easy it is to set up and use FB in everything I do. :)

Woody (TMW)

Ann Harrison

2012-01-03 19:48:03 UTC

Permalink

Woody,

> Maybe I'm a little dense, (probably :), but doesn't FB already know what the
> oldest interesting transaction id is? Why couldn't transaction numbers be
> allowed to wrap back around up to that point? As long as transactions are
> committed at some point, the oldest transaction would move and it would
> solve most problems being run into now.

The oldest interesting transaction is the oldest on that is not known
to be committed. If the oldest interesting transaction is 34667, and
you're transaction 55778, you know that anything created by
transaction 123 is certain to be committed.

Now lets assume that you're transaction 4294967000 and the oldest
interesting transaction was 4294000000 when you started. (Probably
ought to mention that (IIRC) a transaction picks up the value of its
the oldest interesting on startup). Then the transaction counter
rolls around and some new transaction 3 starts creating new
versions... You know they're committed, so you read the new data.

More generally, how do you know the difference between the old
transaction 3 record versions which you do need to read and new new
transaction 3 records that you don't want to read?

> I will accept any and all ridicule if this seems idiotic ...

Not at all idiotic. This stuff is complicated.

Cheers,

Ann

Kjell Rilbe

2012-01-03 19:49:25 UTC

Permalink

Den 2012-01-03 20:14 skrev Woody såhär:
> Maybe I'm a little dense, (probably :), but doesn't FB already know what the
> oldest interesting transaction id is? Why couldn't transaction numbers be
> allowed to wrap back around up to that point? As long as transactions are
> committed at some point, the oldest transaction would move and it would
> solve most problems being run into now, IMO.

As far as I understand, with very limited knowledge about the ods, each
version of each record contains the id of the transaction that created
that record version. So, if transaction with id 5 created the record
version it will say "5" in there, always, as long as that record version
is still in existence.

Now, old record versions get garbage collected. If the record version is
not the current one and its id is lower than the OIT, the disk space for
that record version is marked as free.

But this happens ONLY for transaction version that are not "current".
Consider a lookup table that's created when the database is created.
Those records will possibly never change. Same goes for log records. So,
that old "5" will stay there forever, regardless of the OIT.

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Dmitry Yemanov

2012-01-03 20:06:16 UTC

Permalink

03.01.2012 23:49, Kjell Rilbe wrote:

> As far as I understand, with very limited knowledge about the ods, each
> version of each record contains the id of the transaction that created
> that record version. So, if transaction with id 5 created the record
> version it will say "5" in there, always, as long as that record version
> is still in existence.
>
> Now, old record versions get garbage collected. If the record version is
> not the current one and its id is lower than the OIT, the disk space for
> that record version is marked as free.
>
> But this happens ONLY for transaction version that are not "current".
> Consider a lookup table that's created when the database is created.
> Those records will possibly never change. Same goes for log records. So,
> that old "5" will stay there forever, regardless of the OIT.

Theoretically, it could be worked around. A regular or manually started
"something-like-a-sweep-but-different" activity could visit all the
committed record versions and reset their txn IDs to e.g. OIT-1, thus
making the ID space denser and causing a wrap around to be more-or-less
safe.

But it's going to be terribly slow (almost all data pages have to be
modified). Also, with the wrapping allowed, the whole logic that handles
txn IDs would turn to be much more complicated (simple checks like
MAX(ID1, ID2) won't work anymore).

In the past, I liked the idea to wrap the txn IDs, but now I'm more and
more keen to consider other solutions instead.

Dmitry

Ann Harrison

2012-01-03 20:59:59 UTC

Permalink

On Tue, Jan 3, 2012 at 3:06 PM, Dmitry Yemanov <***@yandex.ru> wrote:

>
> In the past, I liked the idea to wrap the txn IDs, but now I'm more and
> more keen to consider other solutions instead.
>
Completely agree - including having changed my mind. I agree with
Dimitry S. that his replicate/backup/restore/reverse strategy works,
assuming that the load is light enough that the newly restored
replicant can eventually catch up. At the same time, one of
Firebird's strong points is the limited amount of expertise necessary
to manage the system, so in the longer run, either a variable length
transaction id or a record version by record version flag indicating
the size of the transaction id has a lot of merit.

Best regards,

Ann

Kjell Rilbe

2012-01-03 21:34:58 UTC

Permalink

Den 2012-01-03 21:59 skrev Ann Harrison såhär:
> On Tue, Jan 3, 2012 at 3:06 PM, Dmitry Yemanov<***@yandex.ru> wrote:
>
>> In the past, I liked the idea to wrap the txn IDs, but now I'm more and
>> more keen to consider other solutions instead.
>>
> Completely agree - including having changed my mind. I agree with
> Dimitry S. that his replicate/backup/restore/reverse strategy works,
> assuming that the load is light enough that the newly restored
> replicant can eventually catch up. At the same time, one of
> Firebird's strong points is the limited amount of expertise necessary
> to manage the system, so in the longer run, either a variable length
> transaction id or a record version by record version flag indicating
> the size of the transaction id has a lot of merit.

Or a more automated and built-in support to do such a
replicate/backup/restore/reverse. For me it's question of time. Sure, I
could learn how to setup a cluster and replication. But there are dozens
of other things I also need to do yesterday, so having to learn this on
top of everything else is a stumbling block.

Could the procedure be "packaged" into some kind of utility program or
something?

I'm thinking that nbackup locks the master file while keeping track of
changed pages in a separate file. Perhaps a transaction id consolidation
similar to what happens on backup/restore could be performed on a locked
database master while logging updates in a separate file, and then bring
the consolidated master up to date again.

If this is very difficult, perhaps there's no point - devel resources
better spent elsewhere. But if it would be a fairly simple task...?

Kjell

--
--------------------------------------
Kjell Rilbe
DataDIA AB
E-post: ***@datadia.se
Telefon: 08-761 06 55
Mobil: 0733-44 24 64

Ann Harrison

2012-01-03 22:09:14 UTC

Permalink

Kjell,

>
> Or a more automated and built-in support to do such a
> replicate/backup/restore/reverse. For me it's question of time. Sure, I
> could learn how to setup a cluster and replication. But there are dozens
> of other things I also need to do yesterday, so having to learn this on
> top of everything else is a stumbling block.
>
> Could the procedure be "packaged" into some kind of utility program or
> something?

The short answer is probably just "No." Could someone build a robot
that would identify a flat tire, take your spare tire out of your
trunk, jack up your car, remove the flat, put on the spare, lower the
care, and put the flat tire back in the trunk? Probably. Would it be
easier than learning to change a tire? Somewhat unlikely. On a
heavily loaded system, the replicated database (replicant in my
jargon) can't share a disk and set of set of cpu's with the primary
database. (That's the trunk part of the analogy.) Once established,
the replicant has to create a foundation copy of the primary database
(jacking up the car), then process updates until it's approximately
current current with the primary database (removing the old tire),
then initiate a backup/restore, wait for the restore to complete
successfully(install the new tire), swap in the newly created database
and catch up to the primary again (lower the car). Finally, once the
newly restored replicant is absolutely current, the system must
quiesce for a few seconds to swap primary and replicant databases
(getting the old tire into the trunk).

>
> I'm thinking that nbackup locks the master file while keeping track of
> changed pages in a separate file. Perhaps a transaction id consolidation
> similar to what happens on backup/restore could be performed on a locked
> database master while logging updates in a separate file, and then bring
> the consolidated master up to date again.

nbackup works at the page level which is simpler than handling record
level changes. Unlike records, pages never go away, nor do they
change their primary identifier.

> If this is very difficult, perhaps there's no point - devel resources
> better spent elsewhere. But if it would be a fairly simple task...?

Alas, I doubt that its simple.

Best regards,

Ann

Ann Harrison

2012-01-03 18:36:25 UTC

Permalink

Sean,
>
>> The problem is not downtime is how much downtime. Backup and restore is
>> so much downtime.
>
> There are a couple of possible solutions which would reduce the downtime;
> - a new backup/restore tool which would use multiple readers/writers to minimize execution time,

Here we're talking about a logical backup that can be used to restart
transaction numbers. Record numbers are based loosely on record
storage location. Since a logical backup/restore changes storage
location and thus record numbers and indexes link values to record
numbers, indexes must be recreated.

The problem with a multi-threaded logical backup is that all the
threads contend for the same I/O bandwidth and possibly the same CPU
time. Much of the restore time is spent sorting keys to recreate
indexes and multiple threads would contend for the same temporary disk
I/O.

> - a "data port" utility which would allow for data to be ported from a live database to a new database while live is active but would need a finalization step where the live database is shutdown to apply the final data changes and add FK constraints.

It's not immediately obvious to me how that sort of backup/restore
could reset transaction numbers.

> There are, however, certain "realities" which cannot be overcome; disk throughput/IO performance.
>
>

Thomas Steinmaurer

2012-01-03 19:49:43 UTC

Permalink

>>> The problem is not downtime is how much downtime. Backup and restore is
>>> so much downtime.
>>
>> There are a couple of possible solutions which would reduce the downtime;
>> - a new backup/restore tool which would use multiple readers/writers to minimize execution time,
>
> Here we're talking about a logical backup that can be used to restart
> transaction numbers. Record numbers are based loosely on record
> storage location. Since a logical backup/restore changes storage
> location and thus record numbers and indexes link values to record
> numbers, indexes must be recreated.
>
> The problem with a multi-threaded logical backup is that all the
> threads contend for the same I/O bandwidth and possibly the same CPU
> time. Much of the restore time is spent sorting keys to recreate
> indexes and multiple threads would contend for the same temporary disk
> I/O.

While the restore process is pretty much I/O bound, creating indices
loves RAM, as I have seen in some tests I had made in the past. So
restore might get a speed up when more RAM can be utilized. There is a
-bu(ffers) option for the restore, but I think this really overrides the
database page buffers than acting as a higher temporary page cache.

Another option might be to restore to a RAM disk, if the database file
fits onto it and then move the restored database to a persistent storage.

>> - a "data port" utility which would allow for data to be ported from a live database to a new database while live is active but would need a finalization step where the live database is shutdown to apply the final data changes and add FK constraints.
>
> It's not immediately obvious to me how that sort of backup/restore
> could reset transaction numbers.
>
>> There are, however, certain "realities" which cannot be overcome; disk throughput/IO performance.

True, but things are getting better if we can do more stuff in RAM which
would go to disk otherwise, especially temporary data e.g. when creating
indices.

Regards,
Thomas

Leyne, Sean

2012-01-03 20:09:53 UTC

Permalink

Thomas,

> While the restore process is pretty much I/O bound,

While that is true for desktop PCs with HDDs (not SSDs) or server without a cached RAID controllers, but it is certainly not true as a blanket statement.

There is plenty of room for more throughput.

> Another option might be to restore to a RAM disk, if the database file fits
> onto it and then move the restored database to a persistent storage.

Agreed, and a restore process with multiple writers and multiple index rebuilds would be of even more significant benefit!

A RAM disk based backup/restore is possible today without any change to FB. I am referring to a truly kick-but restore process.

Sean

Leyne, Sean

2012-01-03 19:37:38 UTC

Permalink

> 03.01.2012 18:51, Leyne, Sean wrote:
> > - a "data port" utility which would allow for data to be ported from a live
> database to a new database while live is active but would need a finalization
> step where the live database is shutdown to apply the final data changes and
> add FK constraints.
>
> And exactly this utility is called "replicator". If made right, it doesn't need FK
> deactivation and can do "finalization step" when new database is already in
> use.
> Aren't you tired inventing a wheel?..

As Ann poetically replied, there are different wheels for different vehicles.

I was/am thinking that replicator requires more setup than a multi-threaded data pump utility would. The problem is not a perpetual problem, it is relatively one- time only.

As for the FKs, I see the tool as being a some which needs to have the maximum performance. FKs are required for referential integrity but they slow down bulk move operations, since the FK values needs to be checked. Further, since the data integrity would already be enforced in the Live database, the target database would not need FKs until the "go live" pre-launch.

Also, FK would require that the database schema be analyzed to process tables/rows in the correct order, which is an obstacle to maximum performance.

Sean

Dimitry Sibiryakov

2012-01-03 19:50:05 UTC

Permalink

03.01.2012 20:37, Leyne, Sean wrote:
> As for the FKs, I see the tool as being a some which needs to have the maximum performance.

If you are going to move whole database at once - yes. Fortunately, in most cases it is
not necessary.

--
SY, SD.

Yi Lu

2012-01-13 19:57:28 UTC

Permalink

Hi All,
When we are trying to change structure “header page” in file ods.h
(For example hdr_next_transaction from SLONG to SINT64)
We have the following error
C:\Program Files\Firebird\Firebird_2_5\bin>gbak -c -r -user sysdba
-password masterkey -service localhost:service_mgr "C:\Program
Files\Firebird\Firebird_2_5\EMPLOYEE.FBK" "C:\Program
Files\Firebird\Firebird_2_5\employee1.fdb"
gbak: ERROR:cannot attach to password database
gbak:Exiting before completion due to errors
Detail error in the firebird.log is following:
Error in isc_attach_database() API call when working with security database
file C:\PROGRAM FILES\FIREBIRD\FIREBIRD_2_5\SECURITY2.FDB is not a valid
database

We have the questions as following:
1. How build new security database SECURITY2.FDB that is compatible to
The Changes in ODS.h file
2. What is additional changes should we make in Source Code to be compatible
With changes in ODS.h file.
Thanks in advance

--
View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4293219.html
Sent from the firebird-devel mailing list archive at Nabble.com.

Alex Peshkoff

2012-01-16 06:14:41 UTC

Permalink

On 01/13/12 23:57, Yi Lu wrote:
> Hi All,
> When we are trying to change structure “header page” in file ods.h
> (For example hdr_next_transaction from SLONG to SINT64)
> We have the following error
> C:\Program Files\Firebird\Firebird_2_5\bin>gbak -c -r -user sysdba
> -password masterkey -service localhost:service_mgr "C:\Program
> Files\Firebird\Firebird_2_5\EMPLOYEE.FBK" "C:\Program
> Files\Firebird\Firebird_2_5\employee1.fdb"
> gbak: ERROR:cannot attach to password database
> gbak:Exiting before completion due to errors
> Detail error in the firebird.log is following:
> Error in isc_attach_database() API call when working with security database
> file C:\PROGRAM FILES\FIREBIRD\FIREBIRD_2_5\SECURITY2.FDB is not a valid
> database
>
> We have the questions as following:
> 1. How build new security database SECURITY2.FDB that is compatible to
> The Changes in ODS.h file

You should run boot build with changed ODS, correct SECURITY2.FDB will
be created during boot build.

> 2. What is additional changes should we make in Source Code to be compatible
> With changes in ODS.h file.

That's what you should find and do to make it work :)

> Thanks in advance
>
>
>
> --
> View this message in context: http://firebird.1100200.n4.nabble.com/Firebird-Transaction-ID-limit-solution-tp4230210p4293219.html
> Sent from the firebird-devel mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Mar 27 - Feb 2
> Save $400 by Jan. 27
> Register now!
> http://p.sf.net/sfu/rsa-sfdev2dev2
> Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel

u***@libero.it

2012-01-16 13:24:09 UTC

Permalink

Hi all,

i've read many comments on this argument, but seems that ideas to solve this
problem (if it is a true real problem in near future) are not many.

I remember that in 8086 era, when addressable memory became cheaper and bigger
than dimension reserved in opcode space (24 bits IIRC), hardware architecture
changed a bit, implementing relocation tables, where low level bits where the
displacement in a page and higher bits instead where used as a pointer in a
page table containing the base address of the real memory page.

As a parallel in our case, not many transaction IDs are important, interesting
or whatever you want call them, between 0 and next transaction number, a lot of
them could be ... "forgotten".

I know that transaction management is not implemented in hardware, but having
this in memory could help and shouldn't be very big in size.
AFAIU many classes uses transaction information and probably transaction ID
relocation algebra should be overloaded in the transaction class (don't know if
are there any such class, it's completely a hypothetical reasoning).
With a management like this, are any chance to re-compact them? If a page of
"relocatable IDs" has no more "pending" transaction, should be discarded and
the next can downgrade to occupy the freed "address space": this maybe needs a
use count, with the good and the bad it has.
Anyway, as are they are implemented now, it's not a simple task, because, and
maybe here i'm very wrong, they are used for different purposes: mark different
versions of a record as they are created AND time ordering any correlated on
unrelated change in DDL or DML of the database (AND maybe other things I don't
remember now or I'm not aware).

Perhaps discussion about this should be better in fb-architect, but searching
"transaction id" I found other type of philosophical discussion there. And in
tracker.. nothing at all? A note should be somewhere, but cannot find it right
now.

Comments?

Ciao.
Mimmo.

Ann Harrison

2012-01-16 16:51:00 UTC

Permalink

On Mon, Jan 16, 2012 at 8:24 AM, ***@libero.it
<***@libero.it> wrote:
>
> I remember that in 8086 era, when addressable memory became cheaper and bigger
> than dimension reserved in opcode space (24 bits IIRC), hardware architecture
> changed a bit, implementing relocation tables, where low level bits where the
> displacement in a page and higher bits instead where used as a pointer in a
> page table containing the base address of the real memory page.

A relocation table for transactions is an interesting idea. Let me
explain how transaction
identifiers are used. It's very simple, and, given that transaction
numbers generally go up
over time, some major simplification is possible.

Record versions are identified by transaction number of the
transaction that created them.

Transactions that don't insert, update, or delete records are of no
interest to anyone once
they end. The long term value in transaction identifiers is only to
identify record versions.

Transactions have four states: committed, rolled back, limbo, and
active. The first two
are pretty obvious.

Limbo is the state that a two-phase commit enters when it has
agreed that it can rollback or commit as necessary, with no
possibility of error in either
path. Normally, transactions are in limbo very briefly until their
partner transactions all
agree on a direction and commit or rollback. The problem that
two-phase commit solves
is a system crash during a commit on several sites. If that happens
some of the sites
may have committed and the other sites must commit the transactions in
limbo when
they recover.

Active is more interesting than it looks. When allocating a new
transaction inventory
page, Firebird sets every transaction to active. When Firebird
restarts after a crash or
shutdown it searches the transaction inventory pages for transactions
between the
oldest active and the next transaction. If it finds any transactions
in the active state,
it changes them to rolled back.

This leads to a couple of simplifying assumptions.

When a transaction starts, it can be certain that all transactions with higher
transaction numbers are active.

Transactions keep track of the state of older transactions down to the
"oldest interesting"
which is the first transaction left in the database that did not
commit. Sweeping the database
moves the "oldest interesting" by removing changes made by
transactions that rolled back.

Older record versions have lower transaction numbers than newer
versions. Figuring out
what record version to use is pretty easy. Find the first version
that was committed when
the reading transaction started.

>
> As a parallel in our case, not many transaction IDs are important, interesting
> or whatever you want call them, between 0 and next transaction number, a lot of
> them could be ... "forgotten".

OK. So how would you do a remap?

Reusing the identifiers of transactions that created record versions
that are still in the
database is a bad idea. Resetting all the transaction identifiers of
mature records (records
without back versions) is possible but would require a mega-sweep
(vacuum?) that would
be slow, and, unfortunately, cannot reset the most recent transaction
identifiers.

However, the final state of transactions that don't make changes is
kept on transaction inventory
pages. Those numbers could be reused if those transactions could be
identified and if the code
could be changed so a new transaction 128 understood that transactions
129 - 4,000,000,000
are committed and that changes by transaction 100 are newer than
changes by transaction
3,999,999,970, but that changes by transaction 3 are older.

>
> I know that transaction management is not implemented in hardware, but having
> this in memory could help and shouldn't be very big in size.

I think a relocation table between unused transaction numbers in the
32-bit range
to virtual transaction numbers in the 64 bit range could be pretty
big. And it would
have to be permanent, durable, on disk, and current.

> AFAIU many classes uses transaction information and probably transaction ID
> relocation algebra should be overloaded in the transaction class (don't know if
> are there any such class, it's completely a hypothetical reasoning).

The relative age of record versions is critically important to any
MVCC system. The
"relocation algebra" has to be permanent.

> With a management like this, are any chance to re-compact them? If a page of
> "relocatable IDs" has no more "pending" transaction, should be discarded and
> the next can downgrade to occupy the freed "address space": this maybe needs a
> use count, with the good and the bad it has.

Err, those transaction identifiers are written into record versions on
disk. They
can't be released without rewriting the pages.

> Anyway, as are they are implemented now, it's not a simple task, because, and
> maybe here i'm very wrong, they are used for different purposes: mark different
> versions of a record as they are created AND time ordering any correlated on
> unrelated change in DDL or DML of the database (AND maybe other things I don't
> remember now or I'm not aware).

No, there's no correlation between transaction id's and DDL operations. Changes
to table definitions are managed with record format versions.

Having thought about it for a minute, relocation of transaction
identifiers sounds hard,
error prone, and likely to lead to slower and slower operations as the
number of available
identifiers shrinks.

Cheers,

Ann

W O

2012-01-17 03:17:35 UTC

Permalink

Doug Chamberlin

2012-01-17 15:18:51 UTC

Permalink

It seems Oracle has a related issue with their SCN values:

http://www.infoworld.com/d/security/revealed-fundamental-oracle-flaw-184163?source=IFWNLE_nlt_daily_2012-01-17

unknown

1970-01-01 00:00:00 UTC

Permalink

--e89a8ff1c384c8e16804b6b0c5df
Content-Type: text/plain; charset=ISO-8859-1

Excellent Ann, you had explained it very clearly (as usual).

Greetings.

Walter.

On Mon, Jan 16, 2012 at 12:51 PM, Ann Harrison <***@nuodb.com> wrote:

> On Mon, Jan 16, 2012 at 8:24 AM, ***@libero.it
> <***@libero.it> wrote:
> >
> > I remember that in 8086 era, when addressable memory became cheaper and
> bigger
> > than dimension reserved in opcode space (24 bits IIRC), hardware
> architecture
> > changed a bit, implementing relocation tables, where low level bits
> where the
> > displacement in a page and higher bits instead where used as a pointer
> in a
> > page table containing the base address of the real memory page.
>
>
> A relocation table for transactions is an interesting idea. Let me
> explain how transaction
> identifiers are used. It's very simple, and, given that transaction
> numbers generally go up
> over time, some major simplification is possible.
>
> Record versions are identified by transaction number of the
> transaction that created them.
>
> Transactions that don't insert, update, or delete records are of no
> interest to anyone once
> they end. The long term value in transaction identifiers is only to
> identify record versions.
>
> Transactions have four states: committed, rolled back, limbo, and
> active. The first two
> are pretty obvious.
>
> Limbo is the state that a two-phase commit enters when it has
> agreed that it can rollback or commit as necessary, with no
> possibility of error in either
> path. Normally, transactions are in limbo very briefly until their
> partner transactions all
> agree on a direction and commit or rollback. The problem that
> two-phase commit solves
> is a system crash during a commit on several sites. If that happens
> some of the sites
> may have committed and the other sites must commit the transactions in
> limbo when
> they recover.
>
> Active is more interesting than it looks. When allocating a new
> transaction inventory
> page, Firebird sets every transaction to active. When Firebird
> restarts after a crash or
> shutdown it searches the transaction inventory pages for transactions
> between the
> oldest active and the next transaction. If it finds any transactions
> in the active state,
> it changes them to rolled back.
>
> This leads to a couple of simplifying assumptions.
>
> When a transaction starts, it can be certain that all transactions with
> higher
> transaction numbers are active.
>
> Transactions keep track of the state of older transactions down to the
> "oldest interesting"
> which is the first transaction left in the database that did not
> commit. Sweeping the database
> moves the "oldest interesting" by removing changes made by
> transactions that rolled back.
>
> Older record versions have lower transaction numbers than newer
> versions. Figuring out
> what record version to use is pretty easy. Find the first version
> that was committed when
> the reading transaction started.
>
> >
> > As a parallel in our case, not many transaction IDs are important,
> interesting
> > or whatever you want call them, between 0 and next transaction number, a
> lot of
> > them could be ... "forgotten".
>
> OK. So how would you do a remap?
>
> Reusing the identifiers of transactions that created record versions
> that are still in the
> database is a bad idea. Resetting all the transaction identifiers of
> mature records (records
> without back versions) is possible but would require a mega-sweep
> (vacuum?) that would
> be slow, and, unfortunately, cannot reset the most recent transaction
> identifiers.
>
>
> However, the final state of transactions that don't make changes is
> kept on transaction inventory
> pages. Those numbers could be reused if those transactions could be
> identified and if the code
> could be changed so a new transaction 128 understood that transactions
> 129 - 4,000,000,000
> are committed and that changes by transaction 100 are newer than
> changes by transaction
> 3,999,999,970, but that changes by transaction 3 are older.
>
> >
> > I know that transaction management is not implemented in hardware, but
> having
> > this in memory could help and shouldn't be very big in size.
>
> I think a relocation table between unused transaction numbers in the
> 32-bit range
> to virtual transaction numbers in the 64 bit range could be pretty
> big. And it would
> have to be permanent, durable, on disk, and current.
>
> > AFAIU many classes uses transaction information and probably transaction
> ID
> > relocation algebra should be overloaded in the transaction class (don't
> know if
> > are there any such class, it's completely a hypothetical reasoning).
>
> The relative age of record versions is critically important to any
> MVCC system. The
> "relocation algebra" has to be permanent.
>
> > With a management like this, are any chance to re-compact them? If a
> page of
> > "relocatable IDs" has no more "pending" transaction, should be discarded
> and
> > the next can downgrade to occupy the freed "address space": this maybe
> needs a
> > use count, with the good and the bad it has.
>
> Err, those transaction identifiers are written into record versions on
> disk. They
> can't be released without rewriting the pages.
>
> > Anyway, as are they are implemented now, it's not a simple task,
> because, and
> > maybe here i'm very wrong, they are used for different purposes: mark
> different
> > versions of a record as they are created AND time ordering any
> correlated on
> > unrelated change in DDL or DML of the database (AND maybe other things I
> don't
> > remember now or I'm not aware).
>
> No, there's no correlation between transaction id's and DDL operations.
> Changes
> to table definitions are managed with record format versions.
>
> Having thought about it for a minute, relocation of transaction
> identifiers sounds hard,
> error prone, and likely to lead to slower and slower operations as the
> number of available
> identifiers shrinks.
>
> Cheers,
>
> Ann
>
>
> ------------------------------------------------------------------------------
> RSA(R) Conference 2012
> Mar 27 - Feb 2
> Save $400 by Jan. 27
> Register now!
> http://p.sf.net/sfu/rsa-sfdev2dev2
> Firebird-Devel mailing list, web interface at
> https://lists.sourceforge.net/lists/listinfo/firebird-devel
>

--e89a8ff1c384c8e16804b6b0c5df
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Excellent Ann, you had explained it very clearly (as usual). Greetings. Walter. <div class="gmail_quote">On Mon, Jan 16, 2012 at 12:51 PM, Ann Harrison <<a href="mailto:***@nuodb.com">***@nuodb.com</a>> wrote: 
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Mon, Jan 16, 2012 at 8:24 AM, <a href="mailto:***@libero.it">***@libero.it</a> 
<<a href="mailto:***@libero.it">***@libero.it</a>> wrote: 
> 
> I remember that in 8086 era, when addressable memory became cheaper and bigger 
> than dimension reserved in opcode space (24 bits IIRC), hardware architecture 
> changed a bit, implementing relocation tables, where low level bits where the 
> displacement in a page and higher bits instead where used as a pointer in a 
> page table containing the base address of the real memory page. 
 
 
</div>A relocation table for transactions is an interesting idea. Let me 
explain how transaction 
identifiers are used. It's very simple, and, given that transaction 
numbers generally go up 
over time, some major simplification is possible. 
 
Record versions are identified by transaction number of the 
transaction that created them. 
 
Transactions that don't insert, update, or delete records are of no 
interest to anyone once 
they end. The long term value in transaction identifiers is only to 
identify record versions. 
 
Transactions have four states: committed, rolled back, limbo, and 
active. The first two 
are pretty obvious. 
 
Limbo is the state that a two-phase commit enters when it has 
agreed that it can rollback or commit as necessary, with no 
possibility of error in either 
path. Normally, transactions are in limbo very briefly until their 
partner transactions all 
agree on a direction and commit or rollback. The problem that 
two-phase commit solves 
is a system crash during a commit on several sites. If that happens 
some of the sites 
may have committed and the other sites must commit the transactions in 
limbo when 
they recover. 
 
Active is more interesting than it looks. When allocating a new 
transaction inventory 
page, Firebird sets every transaction to active. When Firebird 
restarts after a crash or 
shutdown it searches the transaction inventory pages for transactions 
between the 
oldest active and the next transaction. If it finds any transactions 
in the active state, 
it changes them to rolled back. 
 
This leads to a couple of simplifying assumptions. 
 
When a transaction starts, it can be certain that all transactions with higher 
transaction numbers are active. 
 
Transactions keep track of the state of older transactions down to the 
"oldest interesting" 
which is the first transaction left in the database that did not 
commit. Sweeping the database 
moves the "oldest interesting" by removing changes made by 
transactions that rolled back. 
 
Older record versions have lower transaction numbers than newer 
versions. Figuring out 
what record version to use is pretty easy. Find the first version 
that was committed when 
the reading transaction started. 
<div class="im"> 
> 
> As a parallel in our case, not many transaction IDs are important, interesting 
> or whatever you want call them, between 0 and next transaction number, a lot of 
> them could be ... "forgotten". 
 
</div>OK. So how would you do a remap? 
 
Reusing the identifiers of transactions that created record versions 
that are still in the 
database is a bad idea. Resetting all the transaction identifiers of 
mature records (records 
without back versions) is possible but would require a mega-sweep 
(vacuum?) that would 
be slow, and, unfortunately, cannot reset the most recent transaction 
identifiers. 
 
 
However, the final state of transactions that don't make changes is 
kept on transaction inventory 
pages. Those numbers could be reused if those transactions could be 
identified and if the code 
could be changed so a new transaction 128 understood that transactions 
129 - 4,000,000,000 
are committed and that changes by transaction 100 are newer than 
changes by transaction 
3,999,999,970, but that changes by transaction 3 are older. 
<div class="im"> 
> 
> I know that transaction management is not implemented in hardware, but having 
> this in memory could help and shouldn't be very big in size. 
 
</div>I think a relocation table between unused transaction numbers in the 
32-bit range 
to virtual transaction numbers in the 64 bit range could be pretty 
big. And it would 
have to be permanent, durable, on disk, and current. 
<div class="im"> 
> AFAIU many classes uses transaction information and probably transaction ID 
> relocation algebra should be overloaded in the transaction class (don't know if 
> are there any such class, it's completely a hypothetical reasoning). 
 
</div>The relative age of record versions is critically important to any 
MVCC system. The 
"relocation algebra" has to be permanent. 
<div class="im"> 
> With a management like this, are any chance to re-compact them? If a page of 
> "relocatable IDs" has no more "pending" transaction, should be discarded and 
> the next can downgrade to occupy the freed "address space": this maybe needs a 
> use count, with the good and the bad it has. 
 
</div>Err, those transaction identifiers are written into record versions on 
disk. They 
can't be released without rewriting the pages. 
<div class="im"> 
> Anyway, as are they are implemented now, it's not a simple task, because, and 
> maybe here i'm very wrong, they are used for different purposes: mark different 
> versions of a record as they are created AND time ordering any correlated on 
> unrelated change in DDL or DML of the database (AND maybe other things I don't 
> remember now or I'm not aware). 
 
</div>No, there's no correlation between transaction id's and DDL operations. Changes 
to table definitions are managed with record format versions. 
 
Having thought about it for a minute, relocation of transaction 
identifiers sounds hard, 
error prone, and likely to lead to slower and slower operations as the 
number of available 
identifiers shrinks. 
 
Cheers, 
 
Ann 
<div class="HOEnZb"><div class="h5"> 
------------------------------------------------------------------------------ 
RSA(R) Conference 2012 
Mar 27 - Feb 2 
Save $400 by Jan. 27 
Register now! 
<a href="http://p.sf.net/sfu/rsa-sfdev2dev2" target="_blank">http://p.sf.net/sfu/rsa-sfdev2dev2</a> 
Firebird-Devel mailing list, web interface at <a href="https://lists.sourceforge.net/lists/listinfo/firebird-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/firebird-devel</a> 
</div></div></blockquote></div> 

--e89a8ff1c384c8e16804b6b0c5df--